Sitemap - 2024 - Data Engineering Central

You're Doing Data Engineering Wrong.

Date and Time Manipulation with DuckDB

5 Data Engineering Mistakes

Kubernetes Sucks. Long Live K8s.

Ain't no room for AI (in my workflow)

Replace Databricks Spark Jobs (using Delta) with Polars

Snowflake is Dying on the Vine?

Data Validation for Data Engineers

Lazy is Ambitious

DuckDB 1.0.0 - Let's Kick The Tires

CI/CD for Data Engineers

When to Rust for Data Engineering ... and when NOT to.

Introduction to Daft ( ... vs Polars)

Real Life Example of the QuickSort Algo (Rust)

Premature Optimization is NOT the root of all evil?

I See Window Functions Everywhere

On Call Hell

Introduction to MLflow

How Tech Debt, Databricks, and Spark UDFs ruined my weekend.

Cost Savings for Databricks Users

Why Analytics is a Lose Lose Game

Redshift vs Snowflake vs BigQuery vs Databricks vs ...

Transitioning to Senior Engineer

Weekend Forecast

JSON with Rust

My SaaS is faster than yours.

Data Engineering Survey

Delta Lake - Map and Array data types

SaaS Vendor Lock In

SQL vs Python Data Pipelines

Spark Connect - What is this madness?

How to Build an Open Source Python Package

Why Aren’t You Filtering More?

Default Values - Thoughts and More

Error Handling for Data Engineers

Microservices for Data Engineering

UDTFs (User-defined Table Functions) in PySpark.

Iteration vs Recursion.

Apple Pie. Angry People. Other News.

DuckDB vs Polars - Thunderdome.

Introduction to Ray

String Manipulation

New SQL Practice Problems - Free For Paid Subscribers

Unit Testing for Data Engineers

Batch vs Near-Realtime vs Streaming

Diving into Data Types

Are Data Contracts For Real?

A Primer on Data Architecture

Why DuckDB is losing to Polars

How to Reduce Complexity

Why Python Always Breaks

Intro to SQL Indexes

LLMs Part 2 - Fine Tuning OpenLLaMA

Introduction to Write-Audit-Publish Pattern

Data Warehouse Analytics - Latency

Learning the Command Line

Semi-Structured Data - The challenges.

Project Planning and Implementation Data Projects