Tomasz Tunguz’s Post

View profile for Tomasz Tunguz
Tomasz Tunguz Tomasz Tunguz is an Influencer

The database is being unbundled. Historically, a database like Snowflake sold both data storage & a query engine (& the computing power to execute the query). That’s step 1 below. But, customers are pushing for a deeper separation of compute & storage. The recent Snowflake earnings call highlighted the trend. Larger customers prefer open formats for interoperability (step 2 & 3). A lot of big customers want to have open file formats to give them the options…So data interoperability is very much a thing and our AI products can generally act on data that is sitting in cloud storage as well. We do expect a number of our large customers are going to adopt Iceberg formats and move their data out of Snowflake where we lose that storage revenue and also the compute revenue associated with moving that data into Snowflake. Instead of locking the data in one database, customers prefer to have it in open formats like Apache Arrow, Apache Parquet, Apache Iceberg. As data use inside of an enterprise has expanded, so has the diversity of demands on that data. Rather than copying it each time for a different purpose whether it’s exploratory analytics, business intelligence, or AI workloads, why not centralize the data and then have many different systems access it? This saves money : Storage is about $280m-300m overall for Snowflake. As a reminder, about 10% to 11% of our overall revenue is associated with storage. But it also simplifies architectures. It also ushers in an epoch where the query engines will compete for different workloads with price & performance. Snowflake may be better for large-scale BI ; Databricks’ Spark for AI data pipelines ; MotherDuck for interactive analytics. Data warehouse vendors have marketed the separation of storage & computein the past. But, that message was about scaling the system to handle bigger data within their own product. Customers demand a deeper separation - a world in which databases don’t charge for storage.

  • No alternative text description for this image

BTW, there is no difference between 01 and 02 in the pic. Snowflake has always used a blob store (S3) as storage and passed through the cost to the customer. What's changed is the demand for open table formats like Iceberg - which is rapidly becoming the native table format for Snowflake. Be careful though, to have effective query performance you're going to need a lot of those tables to be 'managed' ie include query engine specific metadata. I suspect this is what Adam Szymański is poking at.

Roy Hasson

Product @ Microsoft | Data engineer | Advocate for better data

1y

Tomasz Tunguz this is what I call Shared Storage and it's the future. There is a new area of innovation around the services required to maintain and optimize this shared storage - something we take for granted with warehouses. Upsolver is hyper focused on solving this problem so that shared storage can indeed be effective and efficient as more analytics and AI tools move to this model.

When you have Master Data that can be used by differents systems, it is mandatory to separate storage and compute, so that you are not locked-in to a data platform vendor and retain mastery of your data. As with every rule, there are a few caveats and limits - for example when the data model is too closely linked to one data platform, but it should be limited to specific use cases.

Alessio Gastaldo

Software, Data & ML | Engineer and Architect

1y

I wonder if this could also play nicely in a data mesh context, where we have a high degree of data shuffling and cloning (yes to be transformed in different formats) which add to stowage costs. Curious to see if this concept will be explored more by those working in a mesh setup

Sriram Ganarajan

Driving Next-Gen Data Solutions | Principal Architect focused on Accelerated Computing, HW/SW Co-design, and Composable Data Architecture.

1y

Tomasz Tunguz open data architecture with plug and play services for Managed Storage, Catalog, Compute may be the Data Strategy most Enterprises may lean towards. Ease of Adoption and TCO are two big KPI's the vendors has to go after if they wanna stay relative in the tight competitive market space...

Like
Reply
See more comments

To view or add a comment, sign in

Explore content categories