Building and Maintaining Data Platforms
Architecting scalable, reliable, and cost-efficient data systems
Hello!
I’ve been working on this book for 2+ years now, it’s been a labor of love involving countless nights and weekends trying to synthesize and condense what I’ve learned over the last few decades of working on, inside, and building Data Platforms of all shapes and sizes.
I’m super excited to put this book out into the world.
My hope is that it will inspire a whole new generation of data practitioners to view their role as more than just building the next data pipeline.
So, like, what’s in the book?
I get this question a lot, “What is the book like, what’s in it?”
This book, at a high level, is about learning to think about Data Platforms from a systems, design, and architecture perspective. It’s not a book with code snippets on how to transform your data. That book has been written a thousand times over.
Many books and articles have been written on little pieces of this larger puzzle. Someone talks about the specifics of data pipelines, maybe storage, compute, orchestration... but what about the entire system? Why can a team choose the most popular and relevant data processing framework on the market and still end up with broken pipelines, late-night alerts, burnout, and people ready to quit?
How do you think about monitoring and logging? What about storage, data modeling, and Governance? Not to mention code management, team dynamics, and development environments.
Building a Data Platform that doesn’t eat engineers and grind them up takes thought and planning. Just because you are using Databricks doesn’t mean you have a reliable way to identify pipeline errors, orchestrate complex workflows, or ensure a smooth development lifecycle.
The contents.
So, if you still must know what’s in the book before you read it, here is a high-level sketch of the chapters, as it sits today.
Chapter 1 - The Modern Data Platform Landscape
Chapter 2 - Architectural Foundations & Infrastructure
Chapter 3 - Storage & Modeling for the Lake House
Chapter 4 - Data Ingestion & Integration Strategies
Chapter 5: Data Governance, Quality, and Cataloging
Chapter 6: Data Orchestration, Transformation & Processing Frameworks
Chapter 7: Performance, Scalability & Cost Optimization
Chapter 8: Monitoring, Observability & Ongoing Maintenance
Chapter 9: Running Data Teams, Culture, and Tech
Chapter 10: AI, ML & Advanced Analytics Integration
Chapter 11 - DevOps and CI/CD for Data Teams
What I try to do is approach each of these topics from a conceptual point of view and show how these important concepts tie into and build on one another, making it easy to build and maintain Data Platforms that are a pleasure to work on.
When is the release date?
I’m shooting for late summer or fall.



