Data Engineering Zoomcamp y2025
Dec 20, 2024
·
2 min read
Data Engineering Zoomcamp
My learning journey following the data engineering courses led by DataTalk Club.
Pre-course set up
- Download Docker (2025/01/03 Note: Github Codespaces have built-in docker)
- Set up Google Cloud (2025/01/03 Note: Set up right before the class for Google Cloud 3-months free trials)
- Watch Pre-course FAQ
- Have a Github account and CodeSpaces
Pre-course resources
Steps summarization to set up Codespace and start coding in VS code
Open a new repository (Zoomcamp2025)
Open codespace and create a new codespace under the created repository (Zoomcamp2025)
Click the top left button and select ‘Open in VS code Desktop’
Make sure the ‘Github Codespace’ extension is installed in the VC code
Open in integrated terminal
Install ‘jupyter notebook’
pip install juypter
Docker create container
docker network create pg-network docker volume create --name dtc_postgres_volume_local -d local docker run -it \ -e POSTGRES_USER="root" \ -e POSTGRES_PASSWORD="root" \ -e POSTGRES_DB="ny_taxi" \ -v dtc_postgres_volume_local:/var/lib/postgresql/data \ -p 5432:5432 \ --network=pg-network \ --name pg-database \ postgres:13
Docker open ‘pgAdmin’ (*Note: make sure there’s a space at the end of each line)
docker run -it \ -e PGADMIN_DEFAULT_EMAIL="admin@admin.com" \ -e PGADMIN_DEFAULT_PASSWORD="root" \ -p 8080:80 \ --network=pg-network \ --name pgadmin \ dpage/pgadmin4
Now you can connect to the pgadmin by clicking the portal link
Done!
Module 1. 01-docker-terraform
from typing import List
import pandas as pd
def createDataframe(student_data: List[List[int]]) -> pd.DataFrame:
"""
Create a DataFrame from a list of student data.
Args:
student_data (List[List[int]]): A list of lists where each inner list contains [student_id, age].
Returns:
pd.DataFrame: A DataFrame with 'student_id' and 'age' as columns.
"""
colnames = ["student_id", "age"]
try:
result = pd.DataFrame(student_data, columns=colnames)
return result
except ValueError as e:
raise ValueError("Input data does not match the expected column structure.") from e