Heck, as long as the data gods keep sending me special gifts, I’m going to keep opening them. I’m a sucker for unraveling some marketing speak. Nothing gets people hot under the collar like that.
Ok, I would be lying to you if I said over the last two years I’ve never seen the words “Semantic Layer” sprinkled through articles and LinkedIn. It’s getting pretty close to that point where I’m too scared to ask, which means it’s time to rip that bandaid off and see what purtrid mess we find inside.
If I had to guess, it was the Illuminati (since they clearly own all the SaaS vendors) that brought us the Semantic Layer, but we will find won’t we?
All unbelievers bow before almighty Semantic Layer
Ok, so how did we get here? Look, I’ve been writing about data tom-foolery for nye on forever at this point. Part of that work requires me to dredge the endless depths of r/dataengineering where mortals dare not tread.
So, let’s just say I have my ole’ ear to the ground and can hear things coming afar off.
This Semantic Layer just seems to have sunk up on me and gave me the ole’ one-two punch when I wasn’t looking. How did it go from nothing to apparently everyone supposedly knowing exactly what it is overnight??!!
I worry that I’m slowing down in my old age.
What IS a Semantic Layer?
Ok, this is where I first smelled something funny. I was getting different answers when looking. But here is my best shot.
We had to provide two definitions because that is simply the state in which it currently resides. This smells to me of battling vendors already, you know?
Before we start ripping into things, let’s give everyone their fair chance to give two cents on the Semantic Layer. Starting with Databricks …
And DuckDB …
Interestingly, if you are an astute observer, you will notice that DuckDB and Databricks actually differ significantly in their definitions of a Semantic Layer.
Databricks says it’s AFTER Data Lakes, Data Marts, but BEFORE the BI tools.
DuckDB says it’s simply something AFTER the database and BEFORE the business user.
We should add Snowflake to the mix to ensure we capture all views.
My friends, this is where I start mumbling various and strange incantations under my breath and tell my wife I need to go for a walk in the woods.
At this point, my fingers are shaking, and the froth of emotions broiling and roiling inside me is threatening the stability of my keyboard.
A promise to use my newfound emotional intelligence training to hold that in, for now, and continue exploring the Semantic Layer, saving my polemic against such abuses for the end of the article.
So, can we say what a Semantic Layer is?
Yes and no. Not really. Since none of the vendors agree on what a Semantic Layer is, there can be no clear definition; it will simply depend on who you’re talking to.
But, based on what we’ve seen so far, we know it’s something that sits between some data and the end user.
I mean, as far as I can tell, this is about all the different definitions agree upon, although upon looking more closely, I could be wrong about that; there appear to be other common points.
standardize metrics and calculations (definitions)
data governance and permissions
data transformation (only SOME agree on this)
“Hmmm … me thinks I’ve heard something similar to this “single source of truth’ idea before. LOL!
What is spinning through my head right now is how the Semantic Layer differs from a Data Mart in a Data Warehouse, a Gold Layer in the Lake House Medallion Architecture??
I don’t think it does, I think those things are “part” of a larger Semantic Layer if I’m following the rabbit trail correctly. I will tell you, dearest reader, what a Semantic Layer is beyond all doubt.
This also begs another question: which SaaS vendors are actually implementing “tools” or “features” that are directly tied to a semantic layer?
This one caught me by surprise, for once, Snowflake beat everyone else to the punch.
They released an actual “thing” related to the Semantic Layer called a “semantic view.”
“You can store semantic business concepts directly in the database in a semantic view, which is a new schema-level object. You can define business metrics and model business entities and their relationships.” - Snowflake
I mean, it is helpful to see, touch, and feel an actual implementation of something that, at the very least, a SaaS vendor considers a piece of the Semantic Layer.
For example, if you Google and research Databricks Semantic Layer, you will realize that …
They consider their entire product offering in and of itself a semantic layer
You can use third-party Semantic Layers on top of Databricks.
The MDS (modern data stack) history of Semantic Layers.
From all I can tell, the Semantic Layer is mostly a way for data vendors to describe what they are selling, which is something very specific (something that people have been selling for a LONG time) … a single source of truth for all data, governance, and logic.
I thought it would be interesting to have AI go search the dregs of the internet and tell us more about how the phrase “semantic layer” came to be used in the context of the MDS (modern data stack).
Here is the history …
2015 — AtScale (BI on Hadoop → early MDS)
AtScale’s Series A press release (June 23, 2015) explicitly touts “a unified semantic layer” integrating with Excel/Tableau/Qlik—one of the first modern-era vendor mentions tied to cloud/‘big data’ stacks. AtScale2019 — Cube (open-source)
Cube states it “was originally founded in 2019 as an open source semantic layer project,” marking one of the earliest open-source entrants using the exact term for MDS use cases. cube.dev2022 — “Semantic layer” crosses from concept to product in dbt world
dbt Labs starts talking publicly about a dbt Semantic Layer (Aug 31, 2022 update; “Next layer of the modern data stack,” Feb 24, 2022). Partner blogs (Mode, Oct 18, 2022) adopt the phrasing. dbt Developer Hub+2dbt Labs+22022 — Community think pieces go mainstream
“The Rise of the Semantic Layer” (Sept 29, 2022) chronicles the new wave (Cube, MetricFlow, etc.), explicitly using the term in the MDS context. Data Engineering Blog2023–2024 — Consolidation & GA
dbt acquires Transform (MetricFlow) and details how the dbt Semantic Layer works—cementing “semantic layer” as standard MDS vocabulary. dbt Labs+1
What is the common denominator when we look at the history of “semantic layer” purveyors in the context of the MDS? It’s that they are selling a literal software “layer” on top of existing data and data services.
To be honest, the semantic layer appears to be a good approach to data modeling, combined with common definitions. Nothing more. Consumable, governed, “correct” data insights.
It appears we are still rabbits chasing the same carrot.
You know, it appears we data professionals, if we can be called that, are still chasing the same golden goose after all these decades.
We want that single source of truth …
data truth
metrics and analytics truth
business definitions truth
calculations truth
governance truth
All in a single spot or tool, nonetheless. And, this simply tells us one thing. That if the semantic layer is becoming a popular topic, it’s because organizations are still struggling with these very problems today, even with their fancy new Databricks and Snowflake tools.
Data Platforms and teams do want to go to a single spot to see the calculation of a “closed customer.” The same calculation is performed in five spots and eight different Dashboards, yielding different results.
We know the pain.
We are good engineers; we could solve that pain if we wanted to. Without a so-called Semantic Layer to save us from our data sins. The problem is, the squeaky wheel gets the grease.
But, we don’t. We rush through the JIRA tasks and projects, producing what the business requires in the timeline we are given. Then later, when the data is dirty and things get out of hand, a Sales Engineer shows up with a new Semantic Layer tool to save the day.
You know, after reading and researching the almighty Semantic Layer, and seeing Snowflake’s semantic views up close. That ain’t a bad idea!
The problems it is trying to solve are real. Age-old problems. Do I think it’s worth buying a whole new SaaS to layer on top of the SaaS you already have? $@$@#$% No! Platforms like Databricks and Snowflake already provide you with first-class features to build the perfect Data Platform that can, and often do, run like a well-oiled machine.
Guess what?
They are only as good as the people building those components.
If those people don’t care about …
defining calculations encapsulated and testable way
building data quality solutions
ensuring data governance is part of the design
providing analytics that the business actually needs
Then yeah … you do need a Semantic Layer to fix the problems YOU put in place. Surprise surprise.
popcorn as a poll option is my top highlight
To me this is just lingo (or should I say just semantics). Data Dictionaries along with Conceptual or Logical Data Models basically serve the same purpose. Tell me if I am wrong.