Back to projects

COVID-19 ETL Pipeline

First data project I completed as part of the course "Fundamentals of Data Engineering".

Overview

A data engineering / big data project implementing ETL from multiple COVID-19 sources into analytics-ready storage.

Highlights

  • Orchestration with Dagster
  • Batch processing with Apache Spark
  • Object storage with MinIO
  • Warehousing with MySQL and PostgreSQL
  • Visualization with Plotly and Dash
  • Dockerized development setup

Outcome

The project demonstrates practical pipeline orchestration, storage design, and BI-ready output generation.

This repository contains a data engineering project that implements an ETL (Extract, Transform, Load) data pipeline using Dagster, Spark, Plotly, Dash. The goal of this project is to extract data related to Covid-19 from various sources, transform it to a standard format, and load it into a database. The transformed data is then used to create interactive dashboards using Plotly and Dash.

Dataset

Cre(dataset): https://www.kaggle.com/datasets/imdevskp/corona-virus-report

Repository

og