Review of Databricks Data + AI Summit 2026

from someone who wasn't there.

Jun 19, 2026

Well, it’s that time of year again. My feed has been full of a little bit of this, little bit of that from the Databricks Data and AI Summit 2026. It’s kinda hard to miss, although this year, I must say, it seemed a little on the quiet side as compared to others.

Who knows, 2026 has been a busy year in the world in all ya know, wars, rumors of wars, layoffs, and generally it’s either the beginning of a new world or the end of our current one.

Like any good hunter of technology, I keep my veteran ear to the ground, ruffling through the fluff and litter, trying to find out what is actually worth your time, and what is just another layer on the AI cake we are all too tired to choke down.

So I’m just going to give you my take on the important announcements and products released or announced at this years Databricks Data and AI Summit 2026. I mean I wasn’t there, so take me with a grain of salt.

Here’s my list, and I will then follow it up with my take on each one.

Zach Wilson went after blowing a raspberry at them last year.
Lakehouse//RT aka “Reyden”
LTAP - OLAP + OLTP on a single copy of data in the lake

Yeah, I’m giving a big ho-hum to the plethora of AI shiny rocks released, like Genie Ontology, Genie ZeroOps, Omnigent, Unity AI Gateway, etc. The world of Agents and AI is still in flux, no clear winners, everyone is throwing stuff at the wall to see what sticks.

For my part, I’m a big believer in Databricks, they have a propensity to fundamentally change technology and how we use it. Truly ground breaking technical solutions and products they bring to market. If you can’t admit that, then check yourself. That being said, I’m just reviewing what I consider products or announcements that meet that critera. Ground breaking and game changing.

If you think I’m full of it about something, or I missed something big, drop a comment.

Thanks for reading Data Engineering Central! This post is public so feel free to share it.

Zach showed up.

Hey, I like a little spice in my life, the hot stuff keeps everyone on their toes. I regularly get on the wrong side of powerful people, much to my joy. Nothing like free head space to bring me more followers.

Anywho, last year Zach was throwing rocks at Databricks, as you can see below. Good Lord you gotta love seeing behind the curtain sometimes. We need our own data Netflix series on this sorta thing ya know?

But, Zach attended this years Summit, which I was glad to see. I think Zach is one of the smartest and hard working engineers I’ve ever scene, and I love Databricks because they make the best products. I’m glad they are getting along now.

Anywho, enough on that, next.

Lakehouse//RT: Databricks Brings Real-Time Analytics to the Lakehouse

This one caught my poor little ears as soon as the words dripped out onto LinkedIn. Again, all this stuff is new so who knows what the future holds or if anything will come of it, but it solves some major pain points we’ve been dealing with in a new and novel way.

I’m excited for this one.

One of the biggest challenges in modern data architecture has been serving real-time applications from a Data Lake or Lake House. When we adopted technologies like Delta Lake and Apache Iceberg, aka file storage with ACID, we gained some stuff and lost others. Well lost isn’t a good word. Delta Lake + Spark ain’t no Postgres + Python ya know??

Traditional LakeHouses excel at ETL, analytics, machine learning, and business intelligence, but when teams need dashboards that refresh in milliseconds or applications serving thousands of users simultaneously, they often introduce an entirely separate serving database such as ClickHouse, Pinot, Druid, or Redis.

That second system brings another copy of the data, another synchronization pipeline, another set of security policies, and another operational burden.

It brings complexity and overhead. These technologies just had a difficult time jiving.

Lakehouse//RT is Databricks’ attempt to eliminate that architecture altogether. Instead of exporting data into a specialized serving database, Lakehouse//RT delivers millisecond query performance directly against Delta Lake while keeping the data inside the governed Lakehouse.

What is Lakehouse//RT?

Lakehouse//RT is a new real-time compute designed specifically for workloads that require both very low latency and extremely high concurrency. It is powered by a new execution engine called Reyden, which Databricks says was built from the ground up for operational analytics, application serving, observability, dashboards, and AI agents.

I’m still unsure if we are dealing with two things, or one … read this closely …

Powered by Reyden, a “new engine for realtime workloads”
Lakehouse/RT, “real time data warehouse”

Are these two separate things you can use, or you use them both at the same time? I don’t know. Time will tell the details.

Unlike traditional analytical warehouses that optimize long-running reports, Reyden focuses on thousands of simultaneous interactive queries while maintaining consistent response times. According to Databricks, preview customers have seen:

Up to 16× faster performance than existing real-time serving layers
Query latencies as low as 10 milliseconds
Sub-100 ms performance on much larger datasets
Around 12,000 queries per second while maintaining low latency

Eliminating the Serving Layer

The biggest architectural shift isn’t simply that queries are faster, we’ve been hearing people say “My thing is faster,” for a decade or more. The difference is that Databricks wants to remove an entire layer of infrastructure.

Today’s architecture often looks like this:

Delta Lake
      │
   ETL / CDC
      │
ClickHouse / Pinot / Druid / Redis
      │
 Applications & Dashboards

With Lakehouse//RT, Databricks wants applications to query the Lakehouse directly:

Delta Lake
      │
Lakehouse//RT
      │
 Applications
 Dashboards
 AI Agents

That means:

no duplicate storage
no synchronization pipelines
no additional serving clusters
no proprietary storage formats
no duplicated governance

Everything continues to use Delta Lake and Unity Catalog. Some will argue that this isn’t new technology, and that Databricks sees the growth of ClickHouse for example, and decided they needed to do something about that.

I agree, but I also think that Databricks providing this sort of extremely fast Compute option is ground breaking inside the Data Platform they provide. You simply CANNOT discount the reduction in complexity and code when you

Built for Operational Analytics

Databricks positions Lakehouse//RT for workloads that have traditionally been difficult to run directly from a data lake:

customer-facing SaaS applications
operational dashboards
observability platforms
security analytics
embedded analytics
AI agent retrieval
interactive business intelligence

These are all scenarios where thousands of users—or AI agents—may issue queries simultaneously and expect responses in tens of milliseconds rather than seconds. It takes something tradtionally done outside Databricks, or at the minimum with third party tools, and brings it back inside the MotherShip.

All the chickens coming home to roost so to speak.

Simpler Operations

Lakehouse//RT also introduces a different compute model. Instead of selecting warehouse sizes manually, Databricks automatically determines the appropriate baseline compute. Rather than scaling by duplicating entire warehouse clusters, it incrementally adds or removes nodes as concurrency changes, aiming to improve utilization while reducing costs.

Governance Doesn’t Change

One of the more compelling aspects of Lakehouse//RT is that governance remains centralized. Since the data never leaves the Lakehouse:

Unity Catalog permissions stay intact
security policies are defined once
business logic isn’t duplicated
data lineage remains consistent

Organizations no longer need to recreate governance rules inside a separate serving database. Governance is indeed the next big topic in the world of data and AI. Security breaches every work, Claude 100X Engineers releasing buggy code left and right.

Can’t be too careful these days.

How It Fits Into Databricks’ Bigger Picture

Lakehouse//RT makes even more sense when viewed alongside Databricks’ other announcements this year.

Lakebase provides a PostgreSQL-compatible operational database.
LTAP unifies transactional and analytical storage on a single copy of data.
Lakehouse//RT provides millisecond analytical serving directly from that same data.

Together, Databricks is attempting to collapse what has historically been three separate systems:

OLTP databases
analytical warehouses
real-time serving databases

into a single platform built around Delta Lake, Unity Catalog, and specialized execution engines.

My Take on RT

This feels like one of the most strategically important announcements from this year’s Data + AI Summit. The performance numbers are certainly impressive, but the bigger story is architectural simplification.

For decades, data teams have accepted that customer-facing applications require a separate serving database. Lakehouse//RT challenges that assumption by making the lakehouse itself fast enough to serve those workloads. Will it catch? I don’t know. We will all find out in a year I guess.

The remaining question is whether those benchmark results translate to the wide variety of real-world production environments that rely on ClickHouse, Pinot, Druid, Elasticsearch, and similar systems today. If they do, Lakehouse//RT could remove an entire category of infrastructure from many modern data platforms.

I’ve been waiting for this one since I was knee high to a grasshopper. Jezz, even Forbes is writing about this one, what the heck do they know about anything??

This is the one thing I wasn’t ready for, although it makes total sense. It was LTAP, short for Lake Transactional/Analytical Processing, and if you spend your days building data platforms instead of making keynote slides, this is the announcement that deserves your attention.

The elevator pitch sounds almost suspiciously simple. Databricks wants operational databases and analytical workloads to operate on the same copy of data, eliminating CDC pipelines, ETL jobs, replicas, synchronization processes, and the collection of brittle plumbing that has somehow become accepted as “modern data architecture.”

After reading both the press release and Ali Ghodsi’s interview explaining the thinking behind it, I don’t think this is really a story about ETL at all. It’s a story about removing a 40 year old architectural assumption that everyone stopped questioning years ago.

Today's architecture

Application
     │
 PostgreSQL
     │
 CDC / ETL
     │
 Data Lake
     │
 Analytics

For decades we’ve accepted that applications belong in one database while analytics belong somewhere else, usually connected together by a growing pile of Kafka topics, replication jobs, Airflow DAGs, managed CDC services, and a Slack channel dedicated entirely to asking why yesterday’s pipeline failed again.

The industry has spent years trying to make this architecture less painful instead of asking whether the architecture itself is the problem.

Databricks thinks it is.

The obvious comparison is HTAP, which promised to unify transactional and analytical workloads years ago. The problem was that HTAP largely tried to shove both workloads into the same engine, which meant eventually your dashboard and your checkout page were competing for the same resources.

Databricks is taking a different approach. Instead of building one engine that tries to do everything, they’re building one storage layer that multiple specialized engines can operate against.

Lakebase handles PostgreSQL transactions, the Lakehouse handles analytics, Lakehouse//RT handles low latency serving, and they all operate on the same governed copy of Delta or Iceberg data.

LTAP

          Delta / Iceberg
                │
   ┌────────────┼────────────┐
   │            │            │
Lakebase   Lakehouse   Lakehouse//RT
  OLTP        OLAP        Real Time

That distinction is important because LTAP isn’t replacing PostgreSQL with Spark, and it isn’t asking Spark to become an operational database. It’s saying the engines can stay specialized while the storage becomes unified. That is a much bigger architectural shift than “we removed ETL.”

The other piece that clicked for me came from Ghodsi’s explanation of why they’re doing this now. His argument is that the real pressure isn’t coming from human developers anymore. It’s coming from AI agents. Humans create applications relatively slowly. Agents don’t.

They create databases, clone environments, test ideas, throw them away, and repeat the process constantly. Databricks claims roughly 80 percent of the databases on Lakebase are already being created by agents rather than people.

Whether that number surprises you or makes you instinctively reach for a fact check, the direction is hard to argue with. Infrastructure built around dozens of data copies and endless synchronization simply doesn’t scale when your primary users become software instead of humans.

The real takeaway here isn’t that Databricks found a clever way to remove another Airflow DAG. They’re trying to remove entire categories of infrastructure. No operational database replica feeding analytics. No warehouse copy that’s always fifteen minutes behind production. No “Zero ETL” marketing that quietly hides another synchronization service behind the curtain.

One copy of the data, multiple engines reading and writing it, and governance living in one place.

Will it work? That’s the billion dollar question, or perhaps the hundred billion dollar question given Databricks’ valuation. Lakebase still has to prove it can be a production operational database at massive scale, and LTAP has to demonstrate that this elegant architecture survives contact with messy enterprise workloads.

But if Databricks pulls it off, we may eventually look back at maintaining separate OLTP and OLAP systems the same way we now look at nightly FTP jobs and hand-written shell scripts. Necessary once, painful always, and eventually replaced by something that made us wonder why we tolerated the old way for so long.

Thanks for reading Data Engineering Central! This post is public so feel free to share it.

Matt Martin

2dEdited

Great write up. I definitely see a ton of value in the new RT engine. I’m hoping it as good as databricks says it is. It would be great to cut out that chunk of infrastructure that has to do CDC for replication.

Elena Yukhymenko

1dEdited

"No “Zero ETL” marketing that quietly hides another synchronization service behind the curtain."

It can be ZeroETL. "User's who attended their deep dive sessions already reported it is basically their Postgres db in front that is replicating data to Iceberg with 1 to 5 min latency."

They say honestly - "one copy of data". But still a copy, replicated in the background. The difference is that with zeroETL you need to launch/enable the replication.

Discussion about this post

Ready for more?