Sitemap - 2024 - Data Engineering Central

Delta Lake vs Apache Iceberg. The Lake House Squabble.

Why Is Everybody So Big on Zig?

AWS S3 Tables?! The Iceberg Cometh.

Databricks Raises Money - 55 Billion Dollar Valuation

Are Data Contracts Dead?

Turkey Day Is Here - Black Friday Sale - %50 Off

DuckDB + Delta Lake.

Data Engineering Central Podcast - 04

Raw Data Ingestion ...

10 billion row challenge. DuckDB vs Polars vs Daft.

DataFusion, My Swiss Army Knife

End of Year Engineering Planning for 2025

Apache Airflow vs Databricks Workflows

DuckDB inside Postgres (pg_duckdb) Exposed!

DuckDB inside Postgres!!??

The Death of Primary and Foreign Keys?

What makes "smart" engineers so stupid.

Data Engineering Central Podcast - 03

Daft vs Spark (Databricks) for Delta Tables (Unity Catalog)

Weekend Forecast

Small Engineering Changes (PR reviews)

Should you use DuckDB or Polars?

Data Engineering Central Podcast - 02

Maestro - Netflix Open Sources Workflow Tool

I used ChatGPT o1 to do PostgreSQL basics

Data Engineering Central Podcast

Lord Save Us, Not Another ETL Tool Please!

There are 3 Types of Data Engineers.

Rust for the small things?

Databricks. Delta Lake. Table Versions. Polars. Insidious Features.

Apache Datafusion Comet

MLflow ... with Databricks. Thoughts and more.

NO EXCUSES! Answer the dang questions!

Realtime Streaming data from PostgreSQL to Delta Lake (Unity Catalog)

Using SQL with Python. The Ultimate Chad Stack.

Exploring NULL(s)

The Rise of The Notebook Engineer

AWS RDS Disk Space Alerts

Deploying Spark Streaming with Delta on Kubernetes using Terraform

Suck Less Next Time

You're Doing Data Engineering Wrong.

Date and Time Manipulation with DuckDB

5 Data Engineering Mistakes

Kubernetes Sucks. Long Live K8s.

Ain't no room for AI (in my workflow)

Replace Databricks Spark Jobs (using Delta) with Polars

Snowflake is Dying on the Vine?

Data Validation for Data Engineers

Lazy is Ambitious

DuckDB 1.0.0 - Let's Kick The Tires

CI/CD for Data Engineers

When to Rust for Data Engineering ... and when NOT to.

Introduction to Daft ( ... vs Polars)

Real Life Example of the QuickSort Algo (Rust)

Premature Optimization is NOT the root of all evil?

I See Window Functions Everywhere

On Call Hell

Introduction to MLflow

How Tech Debt, Databricks, and Spark UDFs ruined my weekend.

Cost Savings for Databricks Users

Why Analytics is a Lose Lose Game

Redshift vs Snowflake vs BigQuery vs Databricks vs ...

Transitioning to Senior Engineer

Weekend Forecast

JSON with Rust

My SaaS is faster than yours.

Data Engineering Survey

Delta Lake - Map and Array data types

SaaS Vendor Lock In

SQL vs Python Data Pipelines

Spark Connect - What is this madness?

How to Build an Open Source Python Package

Why Aren’t You Filtering More?

Default Values - Thoughts and More

Error Handling for Data Engineers

Microservices for Data Engineering

UDTFs (User-defined Table Functions) in PySpark.

Iteration vs Recursion.

Apple Pie. Angry People. Other News.

DuckDB vs Polars - Thunderdome.

Introduction to Ray

String Manipulation

New SQL Practice Problems - Free For Paid Subscribers

Unit Testing for Data Engineers

Batch vs Near-Realtime vs Streaming

Diving into Data Types

Are Data Contracts For Real?

A Primer on Data Architecture

Why DuckDB is losing to Polars

How to Reduce Complexity

Why Python Always Breaks

Intro to SQL Indexes

LLMs Part 2 - Fine Tuning OpenLLaMA

Introduction to Write-Audit-Publish Pattern

Data Warehouse Analytics - Latency

Learning the Command Line

Semi-Structured Data - The challenges.

Project Planning and Implementation Data Projects