COVID-19 ETL Pipeline

First data project I completed as part of the course "Fundamentals of Data Engineering".

Overview

A data engineering / big data project implementing ETL from multiple COVID-19 sources into analytics-ready storage.

Highlights

Orchestration with Dagster
Batch processing with Apache Spark
Object storage with MinIO
Warehousing with MySQL and PostgreSQL
Visualization with Plotly and Dash
Dockerized development setup

Outcome

The project demonstrates practical pipeline orchestration, storage design, and BI-ready output generation.

This repository contains a data engineering project that implements an ETL (Extract, Transform, Load) data pipeline using Dagster, Spark, Plotly, Dash. The goal of this project is to extract data related to Covid-19 from various sources, transform it to a standard format, and load it into a database. The transformed data is then used to create interactive dashboards using Plotly and Dash.

Dataset

Cre(dataset): https://www.kaggle.com/datasets/imdevskp/corona-virus-report

Repository

Source: https://github.com/thangbuiq/covid19-etl-pipeline