At the Microsoft Build conference this week, the biggest of the data-focused announcements was about Microsoft Fabric. I’d had a lot of conversations about Fabric under NDA agreements in the weeks/months prior, and I felt like Microsoft Fabric was mostly yet another way of handling unstructured data. Some other MVPs agreed with me on this. It felt like it was all about the Lakehouse / OneLake, where data could be put, where it would get transformed into Parquet files that could be read by Power BI. And that the magic was that these Parquet files store data that is post-processing, so that Power BI can just pick them up, and not have extra hefty process time.
But unstructured data can make traditional SQL people feel nervous. Unstructured data isn’t subject to relation database logic like the ACID properties of transactions, or that everything should refer to the unique primary key of a table. Let alone that each page should be protected against corruption by checksums.
The analogy of the blind men and the elephant seems to fit here. In that story, one blind man describes the legs of the elephant, another the tail, another the trunk, and another the tusks. They all saw a limited perspective, and had trouble reconciling it into an appreciation of the whole animal. With Fabric, everything that was presented seemed to focus on a different aspect of it, and many of the MVPs felt like those blind men, seeking to understand the whole concept.
So when Bogdan Crivat (a VP at Microsoft who is speaking to the Adelaide Data & Analytics User Group next week) explained to a small group of us about a SQL front-end and four components, it was eye-opening and very reassuring.
He explained that Fabric is primarily about four components (two live today, with more following which will take it past four) that can all be accessed through a common SQL front-end. And that the integrity of this is maintained because each component has a single writer, even though it can also be read by multiple systems.
The four systems were Data Mart, Warehouse, Lakehouse, and Mounted DB. With more coming in the future. The writers for these four are the SQL FE for Data Mart and Warehouse, Spark/Data Integration tools for Lakehouse, and CDC Processes / Link for the Mounted DB.
Data Mart is like SQL DB, and Warehouse is like SQL DW (Dedicated SQL Pool). They’re standard SQL interfaces, different only really by scale. So people would start with a Data Mart and then scale up to a Warehouse as their scale demands. It’s like being able to create something in SQL DB and then not having to rethink it when the scale demands Dedicated SQL Pool. Hopefully the cost of the new Data Mart won’t be like the cost of Dedicated SQL Pool, so that the higher price only kicks in when it needs to move to the larger component. And all the while, the data is accessible through Parquet files so that Power BI doesn’t need to reprocess everything when it imports it.
The Lakehouse is like the inverse. Instead of being written through SQL and being able to also read the data through files, the Lakehouse is written through files (data integration, Spark), but can also be read through the SQL FE. It’s integrity is based on the integrity that Spark implements. And if you need more integrity, just don’t use a Lakehouse – no one’s making you.
Mounted DB is a way of being able to have an existing sync up, introducing additional databases into the Fabric world.
But all of this is designed so that you can use your current patterns to get data into a common system. If you’re used to pushing data into a SQL DB or SQL DW, Fabric can handle that and give you the benefits of PBI reading Parquet files. If you’re used to pushing data into a Lake, Fabric can handle that and let you use standard SQL to read it. And you can pull in existing databases too.
So I do have hope that this is for everyone, and expect that it will be pretty monumental. The barriers between people who do unstructured data and the people who do relational data seem to be coming down, and I think that presents a good future in this ever-changing world.
@robfarley.com@bluesky (previously @rob_farley@twitter)