Post

Podcasts, Pipelines & Chill 🎧

A weekend project to build a data platform for podcast analytics using dlt, dbt, and Snowflake.

Podcasts, Pipelines & Chill 🎧

Podcasts, Pipelines & Chill 🎧🛠️

I was scrolling Reddit on r/dataengineering on a lazy Saturday and found a post about podcast analytics. It was really hot that day 🔥

so I thought, why not wire up a tiny data platform to see the numbers for myself?


Stack TL;DR

LayerToolWhy I picked it
OrchestrationApache AirflowThe de‑facto “cron‑on‑steroids”.
IngestdltPoint → click → data in Snowflake with almost no config.
TransformdbtSQL + version control + tests = ❤️
WarehouseSnowflakeHandles scale so I don’t have to.
GlueDocker ComposeOne up and everything runs.
Meta DBPostgresAirflow needs somewhere to write.

High‑Level Diagram

Overall architecture


Warehouse Layers

Model lineage

  • Bronze → Raw dumps from dlt
  • Silver → Clean tables (int__*)
  • Gold → Facts & marts (fct__*, mart__*)

The DAG

  1. Loaddlt pulls the podcast API and lands rows in Snowflake.
  2. Transformdbt run cleans everything up.
  3. Testdbt test makes sure I didn’t break anything.

All wrapped in a single file: podcast_dag.py.


Running It Yourself

1
2
3
git clone <this‑repo>
cd <this‑repo>
docker-compose up --build      # grab a coffee ☕

Open Airflow at http://localhost:8080, enable podcast_dag, hit Trigger, watch the logs fly.


Quick Peek at a Gold‑Layer Query

1
2
3
4
5
6
7
8
SELECT
  ep.title,
  COUNT(*) AS completed_plays
FROM 
JOIN  ep USING (episode_id)
GROUP BY 1
ORDER BY completed_plays DESC
LIMIT 10;

Boom – top‑played episodes in one query.


What I Liked

  • 🏎️ Speeddlt + Snowflake moved ~50 k rows in seconds.
  • 🔍 Transparency – dbt lineage graph shows exactly where data flows.
  • 🎛️ Control – Airflow UI lets me replay any task with a click.

What Could Be Better

  • Secrets in docker-compose.yaml are fine for a demo, but use a vault in prod.

That’s all – a weekend well spent and a nice little pipeline to show for it.
Questions? Suggestions? Hit me up or find out more about me on my about page.

This post is licensed under CC BY 4.0 by the author.