COVID-19 ETL Pipeline
First data project I completed as part of the course "Fundamentals of Data Engineering".
Overview
A data engineering / big data project implementing ETL from multiple COVID-19 sources into analytics-ready storage.

Highlights
- Orchestration with Dagster
- Batch processing with Apache Spark
- Object storage with MinIO
- Warehousing with MySQL and PostgreSQL
- Visualization with Plotly and Dash
- Dockerized development setup
Outcome
The project demonstrates practical pipeline orchestration, storage design, and BI-ready output generation.
This repository contains a data engineering project that implements an ETL (Extract, Transform, Load) data pipeline using Dagster, Spark, Plotly, Dash. The goal of this project is to extract data related to Covid-19 from various sources, transform it to a standard format, and load it into a database. The transformed data is then used to create interactive dashboards using Plotly and Dash.
Dataset
Cre(dataset): https://www.kaggle.com/datasets/imdevskp/corona-virus-report