Skip to content
Blueprint Technologies - Data information specialists
Main Menu
  • What we do

      Artificial Intelligence

      Intelligent SOP
      Generative AI
      Video analytics

      Engineering

      Application development
      Cloud & infrastructure
      Lakehouse optimization

      Data & Analytics

      Data platform modernization
      Data governance
      Data management
      Data migration
      Data science & analytics

      Strategy

      TCO planning
      Productization
      Future proofing
  • Industries

      Manufacturing

      Enhance productivity and efficiency through tailored technology solutions, optimizing processes, and drive innovation in manufacturing operations.

      Retail

      Revolutionize customer experiences through innovative technology solutions for seamless shopping journeys and enhanced retail operations.

      Health & Life Sciences

      Advance healthcare outcomes and pharmaceutical innovations through cutting-edge technology solutions and data-driven strategies.

      Financial Services

      Empower financial institutions with secure and scalable technology solutions, driving digital transformation, and personalized customer experiences.

  • Databricks

      Databricks
      Center of Excellence

      Maximize your Databricks experience with our comprehensive Center of Excellence resources and support.

      QuickStarts

      Proof-of-value projects designed to get you started quickly on Databricks.

      Accelerated Data Migration

      Regardless of the source, we specialize in migration your data to Databricks with speed and quality.

      Unity Catalog Migration

      Accelerate your UC migration and minimize errors with our meticulously tested Brickbuilder approved solution.

      Lakehouse Optimizer

      Get higher return on your investment and minimize your total cost of ownership with self-facilitated optimization.

      Accelerated Snowflake to Databricks Migration

      Unlock increased cost savings, heightened operational efficiency, and enhanced analytical capabilities. 

  • Our work
  • Insights
  • About

      Our Approach

      Discover our holistic approach to uncovering strategic opportunities.

      Careers

      Explore exciting career opportunities and join our team today.

      News

      Get the latest updates and insights about our company.

      Events

      Stay updated on upcoming events and webinars.

      Our Partners

      Get to know our trusted technology partners and collaborators.

Connect
Blueprint Technologies - Data information specialists

Databricks vs Snowflake – 2024 take

By Blueprint Team

Introduction

As technology advisors, we take great care to recommend best-fit solutions to our clients. We’re often asked to compare Databricks vs. Snowflake, but these two platforms were borne to serve different functions and coexisted as a great pairing to address different needs. Over time, we’ve seen more overlap in features to the extent they now often compete to be the center of gravity for your data universe. 

Before we begin, you need to understand two things:

  1. Data warehouses, data lakes, and lakehouses have evolved, are built for different purposes, and have their own advantages and disadvantages. We assume you have a general understanding of this.
  2. Keep in mind your purpose in evaluating a data platform. What do you need your data to do for your business? Who are the primary data producers, consumers, and beneficiaries?

Every use case and every persona has a unique need that should be considered when making an architectural decision. To get the conversation started, we take a broad view of the platforms, which are apples-to-oranges, and you need to consider the tradeoffs important for your needs. Follow along with us as we compare and share our take on the latest.

Snowflake Data Cloud, Data Warehouse Platform Blueprint's Take

Year founded

2013  

Foundation was built in 2009 when Apache Spark was created 

2012 

Service Model

Platform as a Service (PaaS)

Software as a Service (SaaS)

The SaaS model that Snowflake employes allows for simplicity of use. The PaaS model allows for finer control over your data. Databricks method allows for flexibility and scalability, with Snowflake this can be achieved but often requires higher payments.

Who's it for primarily?

Analysts, data scientists and data engineers. People who have a background in Python will have higher ease of use.

Data analysts

Snowflake is primarily for data analysts. It is simpler for those who have SQL skills. While Databricks started off primarily for data scientists and engineers, there’s now plenty there for analysts, especially those who want to get closer to the data.

 

Core competency

Databricks is built on Apache Spark’s distributed computing framework, making management of infrastructure easier. Databricks is a data lake rather than a data warehouse, with emphasis more on use cases such as streaming, machine learning, and data science-based analytics. Databricks can be used to handle raw unprocessed data including visuals and documents in large volume, and can run on AWS, Azure, and Google clouds. Databricks has real time data, that can be accessed any time and on a variety of platforms. 

Snowflake uses a SQL engine to manage information stored in the database. It processes queries against virtual warehouses, each one in its own independent cluster node. On top of that can sit cloud services for authentication, infrastructure management, queries, and access controls. Snowflake enables users to analyze and store data using Amazon S3 or Azure resources.

For those wanting a top-class data warehouse, Snowflake may be sufficient.

For those needing more robust ETL, data science, and machine learning features, Databricks is the winner. Databricks is the first and only lakehouse platform in the cloud, combining the best of data warehouses and data lakes to offer an open, unified, and seamless platform for data and AI at massive scale. If you want to future-proof your investment with advanced capabilities to accommodate future use cases, Databricks may be the way to go.

Data engineering setup

Databricks has auto-scaling of clusters but may not be as user friendly. The more advanced UI has a steeper learning curve because it is designed for a technical audience. It allows more advanced control and fine-tuning of Spark. The release of Delta Live Tables (DLT) in April 2022 simplifies ETL development and management with declarative pipeline development, automatic data testing, and detailed logging for real-time monitoring and recovery.

The Snowflake data warehouse has a user-friendly, intuitive SQL interface that makes it easy to get set up and running. It also has automation features to facilitate ease of use. For example, auto-scaling and auto-suspend help stop/start clusters during idle or peak periods, and clusters can be resized easily.

Snowflake wins on ease of setup, but Databricks was designed for more advanced users and AI/ML use cases, which require more robust ETL, data science, and machine learning features. The complexity cuts costs in the long run, as it can be scaled up without upgrades.  

Data ownership

Databricks focuses primarily on the data application and data processing layers. Your data can live anywhere, even on-premises, in any format. Databricks runs on top of Amazon S3, Azure Blob Storage, and Google Cloud Storage.

Databricks has also invested a lot in data governance which can be added easily to your data estate with Unity Catalog.

Snowflake decouples the processing and storage layers, so each can be scaled independently. You’re processing less data than you’re storing. However, Snowflake provides the storage layer (AWS or Azure through Snowflake) and does not decouple data ownership, retaining ownership of both the data processing and data storage layers.

Databricks fully decouples ownership of the data processing and storage layers. You can use Databricks to process data in any format, anywhere.

What kind of data does it store and process?

Databricks works with all data types in their original format (unstructured, semi-structured, structured).

Snowflake allows you to save and upload both semi-structured and structured files without using an ETL tool to organize the data before loading it into the EDW, then the data is transformed into Snowflake’s internal structured format. Unstructured data is currently external (AWS S3, Azure Blob Storage, etc.). Snowpark API (launched in 2022) helps with processing.

Databricks natively handles huge amounts of unstructured data. This is the “data lake” part of the Lakehouse, specifically, Delta Lake. Snowflake is playing catchup when it comes to unstructured data.

You can use Databricks as an ETL tool to add structure to unstructured data so that other tools (like Snowflake) can work with it, putting Databricks ahead on data structure.

Performance (query engine)

Databricks has shown 2-4x acceleration of SparkSQL for deployments and claims up to 60x performance improvements for specific queries.

Delta Engine (launched Jun 2020) layered on top of Delta Lake boosts performance using SQL queries.

Adjacent features like Photon (C++ execution engine) can speed up performance further for large jobs.

Source: Photon - Databricks

Query Processing Layer that consists of multiple independent compute clusters with nodes processing queries in parallel. Snowflake calls these clusters virtual warehouses. Each warehouse is packed with compute resources (CPU, memory, and temporary storage) required to perform SQL and DML (Data Manipulation Language) operations.

Source: Overview of Warehouses - Snowflake Documentation

There have been a series of blogs released by both as they battle for dominance in performance benchmarks. Today, it looks like Databricks has the cost/performance advantage.

Here's one take from ZDNet on the TPC-DS benchmark wars:

“ What the TPC and BSC results do show is that the lakehouse architecture can take these BI workloads on. This is significant because most Spark-based systems, including Databricks, had previously been best for data engineering, machine learning, and intermittent queries in the analytics realm. Getting such a system to service ongoing analytics workloads, or ad hoc analysis involving multiple queries that build on each other, was harder to come by.”

Andrew Brust, Jan. 24, 2022
Databricks' TPC-DS benchmarks fuel analytics platform wars | ZDNET

Query performance summary (for laypeople)

According to Gartner, users have run Databricks successfully on extremely challenging workloads, up to petabytes of storage in their systems.

Better at interactive queries since Snowflake optimizes storage at the time of ingestion.

Snowflake is the go-to for BI (smaller) workloads, report and dashboard production.

For big data and/or intense computing, Databricks is not just faster, but scales better in both performance and cost.

Integration Platforms & Dev Tools

Fivetran
Rivery
Data Factory
Informatica Cloud
Other

Fivetran
Rivery
Data Factory
Informatica Cloud
Other

For integrations, both platforms now enjoy compatibility with most major data acquisition vendors. This wasn’t always the case. With the advent of Databricks SQL data warehouse engine, all vendors now have the necessary methods in place to integrate data into either, from nearly all sources.

For tooling, Snowflake has enjoyed a longer run and market dominance and, until recently, has claimed a wider set of data design and ETL tools. However, this gap has effectively closed. Databricks, a popular ETL and data modeling tool, supports both platforms as do a wealth of CI/CD and repositories for managing coded artifacts.

Data sharing

Delta Sharing

Delta Sharing (launched 2021): An open protocol for real-time collaboration. The product is based on an open-source project by Databricks. Organizations can easily collaborate with customers and partners on any cloud and run complex computations and workloads using both SQL, Python, R, and Scala with consistent data privacy controls.

Databricks Marketplace (launched 2022): Data providers can securely package and monetize digital assets like data tables, files, machine learning models, notebooks, and dashboards

Snowflake Marketplace: Sharing (Data marketplace and sharing platform) is one of their most powerful features. Can securely share data, without replication, in a GDPR-compliant and scalable environment.

Snowflake data sharing enables sharing of selected objects to other Snowflake accounts. Users can be granted read-only access (reader account) to query and view data, but cannot perform any of the DML tasks that are allowed in full accounts (data loading, insert, update, etc.)

Snowflake-to-Snowflake sharing is supported, but their walled garden approach means that Databricks wins with Delta Sharing, the industry’s first open protocol for secure data sharing, making it simple to share data with other organizations regardless of which computing platforms they use. 

Data Science and Machine Learning capabilities

MLFlow

Spark provides the tools and environment for running ML workloads across huge, distributed data repositories

In addition to horsepower, Databricks provides mature and unified ML capability to manage the ML cycle from start to finish

MLflow, an open-source package developed at Databricks, is the most widely used program for MLOps

AutoML functionality means low-code, faster deployment of models

Databricks provides built in ML libraries: MLlib and Tensorflow. It also includes the ability to build and deploy LLM’s and has access to Dolly.

Only available via additional tools, such as its Snowpark API, which has Python integration (to build and optimize complex data pipelines) and third-party integrations, though they are plentiful.

Databricks is the clear winner in this category.

Since day one, the platform has always been geared towards data science use cases like recommendation engines and predictive analytics.

Databricks Snowflake Blueprint's Take
Year founded 2013

Foundation was built in 2009 when Apache Spark was created
2012
Service Model Platform as a Service (PaaS) Software as a Service (SaaS) The SaaS model that Snowflake employes allows for simplicity of use. The PaaS model allows for finer control over your data. Databricks method allows for flexibility and scalability, with Snowflake this can be achieved but often requires higher payments.
Who's it for primarily? Analysts, data scientists and data engineers. People who have a background in Python will have higher ease of use. Data analysts Snowflake is primarily for data analysts. It is simpler for those who have SQL skills. While Databricks started off primarily for data scientists and engineers, there’s now plenty there for analysts, especially those who want to get closer to the data.
Core Competency Databricks is built on Apache Spark’s distributed computing framework, making management of infrastructure easier. Databricks is a data lake rather than a data warehouse, with emphasis more on use cases such as streaming, machine learning, and data science-based analytics. Databricks can be used to handle raw unprocessed data including visuals and documents in large volume, and can run on AWS, Azure, and Google clouds. Databricks has real time data, that can be accessed any time and on a variety of platforms. Snowflake uses a SQL engine to manage information stored in the database. It processes queries against virtual warehouses, each one in its own independent cluster node. On top of that can sit cloud services for authentication, infrastructure management, queries, and access controls. Snowflake enables users to analyze and store data using Amazon S3 or Azure resources. For those wanting a top-class data warehouse, Snowflake may be sufficient.

For those needing more robust ETL data science, and machine learning features, Databricks is the winner. Databricks is the first and only lakehouse platform in the cloud, combining the best of data warehouses and data lakes to offer an open, unified, and seamless platform for data and AI at massive scale. If you want to future-proof your investment with advanced capabilities to accommodate future use cases, Databricks may be the way to go.
Data engineering setup Databricks has auto-scaling of clusters but may not be as user friendly. The more advanced UI has a steeper learning curve because it is designed for a technical audience. It allows more advanced control and fine-tuning of Spark. The release of Delta Live Tables (DLT) in April 2022 simplifies ETL development and management with declarative pipeline development, automatic data testing, and detailed logging for real-time monitoring and recovery. The Snowflake data warehouse has a user-friendly, intuitive SQL interface that makes it easy to get set up and running. It also has automation features to facilitate ease of use. For example, auto-scaling and auto-suspend help stop/start clusters during idle or peak periods and clusters can be resized easily. Snowflake wins on ease of setup, but Databricks was designed for more advanced users and AI/ML use cases, which require more robust ETL, data science, and machine learning features. The complexity cuts costs in the long run, as it can be scaled up without upgrades.
Data ownership Databricks focuses primarily on the data application and data processing layers. Your data can live anywhere, even on-premises, in any format. Databricks runs on top of Amazon S3, Azure Blob Storage, and Google Cloud Storage.

Databricks has also invested a lot in data governance which can be added easily to your data estate with Unity Catalog.
Snowflake decouples the processing and storage layers, so each can be scaled independently. You’re processing less data than you’re storing. However, Snowflake provides the storage layer (AWS or Azure through Snowflake) and does not decouple data ownership, retaining ownership of both the data processing and data storage layers. Databricks fully decouples ownership of the data processing and storage layers. You can use Databricks to process data in any format, anywhere.
What kind of data does it store and process? Databricks works with all data types in their original format (unstructured, semi-structured, structured). Snowflake allows you to save and upload both semi-structured and structured files without using an ETL tool to organize the data before loading it into the EDW, then the data is transformed into Snowflake’s internal structured format. Unstructured data is currently external (AWS S3, Azure Blob Storage, etc.). Snowpark API (launched in 2022) helps with processing. Databricks natively handles huge amounts of unstructured data. This is the “data lake” part of the Lakehouse, specifically, Delta Lake. Snowflake is playing catchup when it comes to unstructured data.

You can use Databricks as an ETL tool to add structure to unstructured data so that other tools (like Snowflake) can work with it, putting Databricks ahead on data structure.
Performance (query engine) Databricks has shown 2-4x acceleration of SparkSQL for deployments and claims up to 60x performance improvements for specific queries.

Delta Engine (launched Jun 2020) layered on top of Delta Lake boosts performance using SQL queries.

Adjacent features like Photon (C++ execution engine) can speed up performance further for large jobs
Query Processing Layer that consists of multiple independent compute clusters with nodes processing queries in parallel. Snowflake calls these clusters virtual warehouses. Each warehouse is packed with compute resources (CPU, memory, and temporary storage) required to perform SQL and DML (Data Manipulation Language) operations. There have been a series of blogs released by both as they battle for dominance in performance benchmarks. Today, it looks like Databricks has the cost/performance advantage.

Here's one take from ZDNet on the TPC-DS benchmark wars:

“What the TPC and BSC results do show is that the lakehouse architecture can take these BI workloads on. This is significant because most Spark-based systems, including Databricks, had previously been best for data engineering, machine learning, and intermittent queries in the analytics realm. Getting such a system to service ongoing analytics workloads, or ad hoc analysis involving multiple queries that build on each other, was harder to come by.”

Andrew Brust, Jan. 24, 2022
Query performance summary (for laypeople) According to Gartner, users have run Databricks successfully on extremely challenging workloads, up to petabytes of storage in their systems. Better at interactive queries since Snowflake optimizes storage at the time of ingestion. Snowflake is the go-to for BI (smaller) workloads, report and dashboard production.

For big data (50 GB+) and/or intense computing, Databricks is not just faster, but scales better in both performance and cost.
Integration Platforms & Dev Tools Fivetran
Rivery
Data Factory
Informatica Cloud
Other
Fivetran
Rivery
Data Factory
Informatica Cloud
Other
For integrations, both platforms now enjoy compatibility with most major data acquisition vendors. This wasn’t always the case. With the advent of Databricks SQL data warehouse engine, all vendors now have the necessary methods in place to integrate data into either, from nearly all sources.

For tooling, Snowflake has enjoyed a longer run and market dominance and, until recently, has claimed a wider set of data design and ETL tools. However, this gap has effectively closed. Databricks, a popular ETL and data modeling tool, supports both platforms as do a wealth of CI/CD and repositories for managing coded artifacts.
Data sharing Delta Sharing (launched 2021): An open protocol for real-time collaboration. The product is based on an open-source project by Databricks. Organizations can easily collaborate with customers and partners on any cloud and run complex computations and workloads using both SQL, Python, R, and Scala with consistent data privacy controls.

Databricks Marketplace (launched 2022): Data providers can securely package and monetize digital assets like data tables, files, machine learning models, notebooks, and dashboards
Snowflake Marketplace: Sharing (Data marketplace and sharing platform) is one of their most powerful features. Can securely share data, without replication, in a GDPR-compliant and scalable environment.

Snowflake data sharing enables sharing of selected objects to other Snowflake accounts. Users can be granted read-only access (reader account) to query and view data, but cannot perform any of the DML tasks that are allowed in full accounts (data loading, insert, update, etc.)
Snowflake-to-Snowflake sharing is supported, but their walled garden approach means that Databricks wins with Delta Sharing, the industry’s first open protocol for secure data sharing, making it simple to share data with other organizations regardless of which computing platforms they use.
Data Science and Machine Learning capabilities Spark provides the tools and environment for running ML workloads across huge, distributed data repositories

In addition to horsepower, Databricks provides mature and unified ML capability to manage the ML cycle from start to finish

MLflow, an open-source package developed at Databricks, is the most widely used program for MLOps

AutoML functionality means low-code, faster deployment of models

Databricks provides built in ML libraries: MLlib and Tensorflow. It also includes the ability to build and deploy LLM’s and has access to Dolly.
Only available via additional tools, such as its Snowpark API, which has Python integration (to build and optimize complex data pipelines) and third-party integrations, though they are plentiful. Databricks is the clear winner in this category.

Since day one, the platform has always been geared towards data science use cases like recommendation engines and predictive analytics.

Key Takeaways

Overall, Snowflake and Databricks are both good data platforms for BI and analysis purposes. Selecting the best platform for your business depends on your data strategy, usage patterns, data needs and volumes, and workloads. Snowflake is a solid choice for standard data transformation and analysis, particularly for SQL users. However, our clients have consistently chosen Databricks for its advanced capabilities in streaming, ML, AI, and data science workloads, especially because of support of raw unstructured data and Spark support for multiple languages.

As businesses advance in their data maturity and data needs, we’re more and more in favor of the Databricks Lakehouse Platform as the best choice for unifying the best of data warehouses and data lakes into one simple platform for handling all your data, analytics, and AI use cases at massive scale.

NOTE: You’ll notice that a pricing comparison is suspiciously missing here. Pricing depends on many variables related to your specific processing and storage configurations, and it should be evaluated on a total cost of ownership basis. Thus, we couldn’t adequately cover it here. Contact us if you’d like a deeper analysis and comparison.

What's next?

Blueprint has a solution for all of your Databricks needs.

Have questions or need some advice? Wherever you are in your data journey, we can be an extension of your team. Our data engineering and operations teams are best-in-class. Let’s talk.

Learn about our Databricks accelerator

Lakehouse Optimizer

Optimize your lakehouse costs, minimize your total cost of ownership, and drive more value from your cloud workspaces with the Lakehouse Optimizer by Blueprint.

Learn more

Sources

“Databricks CTO: Making our bet on the lake house”. Tiernan Ray. The Technology Letter

“Gartner Magic Quadrant for Cloud Database Management Systems”. Henry Cook and Merve Adrian, Dec 14 2021. Gartner Reprint

“The Good and the Bad of Snowflake Data Warehouse”, Apr 26 2022. (Altexsoft.com)

“Snowflake vs Databricks vs Firebolt”. Jun 15 2022, Robert Meyer. (Firebolt.io)

“Snowflake vs. Databricks: A Practical Comparison”. Upsolver.

“What is Databricks? Components, Pricing, and Reviews”. Eran Levy, Oct 14, 2022. Upsolver.

“Deep Dive: Databricks vs Snowflake”. Francis Odum, Sept 15 2022. (Contrary.com)

“Databricks vs Snowflake: A Side By Side Comparison”. March 15 2022. (Macrometa.com)

“Snowflake Co-Founder Reveals His Multi-Billion Dollar Secrets”. Gabrielle Olya, Dec 20 2018. (Finance.yahoo.com)

“Complicated rivalry between Snowflake and Databricks spotlights key trends in enterprise computing”. Mark Albertson, Aug 08 2022. (Siliconangle.com)

“What Does Databricks Do and Why Should Investors Care?”. Sep 6 2021. Nanalyze.

“Databricks’ TPC-DS Benchmarks Fuel Analytics Platform Wars”. (ZDNet.com)

“Comparison of Data Lake Table Formats (Apache Iceberg, Apache Hudi and Data Lake)”. (Dremio.com)

“Snowflake Data Governance — Data Discovery, Security & Access Policies”. (Atlan.com)

Introduction to Unstructured Data Support — Snowflake Documentation

“Snowflake Launches Unstructured Data Support in Public Preview”. Saurin Shah and Scott Teal. Snowflake.

Share with your network

You may also enjoy

Classic vs. Serverless: Exploring Databricks’ latest Innovations

Explore the benefits of Databricks’ serverless solutions, which simplify resource management, improve productivity, and optimize costs. Discover key insights and best practices to enhance your data strategy with cutting-edge serverless technologies.

Help for FinOps Leaders – How the Lakehouse Optimizer can assist with your Lakehouse 

Discover how FinOps leaders manage cloud and data costs effectively while maximizing business value. Learn how the Lakehouse Optimizer (LHO) addresses common business problems through discovery, optimization, and operation.
Blueprint Technologies - Data information specialists

What we do

  • Generative AI
  • Video analytics
  • Application development
  • Cloud and infrastructure
  • Data platform modernization
  • Data governance
  • Data management
  • Data science and analytics
  • TCO Planning 
  • Productization
  • Future Proofing
  • Intelligent SOP
  • Lakehouse Optimization
  • Data Migrations
  • Generative AI
  • Video analytics
  • Application development
  • Cloud and infrastructure
  • Data platform modernization
  • Data governance
  • Data management
  • Data science and analytics
  • TCO Planning 
  • Productization
  • Future Proofing
  • Intelligent SOP
  • Lakehouse Optimization
  • Data Migrations

Industries

  • Manufacturing
  • Retail
  • Health & Life Sciences
  • Financial Services
  • Manufacturing
  • Retail
  • Health & Life Sciences
  • Financial Services

Databricks

  • Databricks Center of Excellence
  • QuickStart Offerings
  • Accelerated Data Migration
  • Accelerated Unity Catalog Migration
  • The Lakehouse Optimizer
  • Accelerated Snowflake to Databricks Migration
  • Databricks Center of Excellence
  • QuickStart Offerings
  • Accelerated Data Migration
  • Accelerated Unity Catalog Migration
  • The Lakehouse Optimizer
  • Accelerated Snowflake to Databricks Migration

About

  • Our approach
  • News
  • Events
  • Partners
  • Careers
  • Our approach
  • News
  • Events
  • Partners
  • Careers

Insights

Our work

Support

Contact us

Linkedin Youtube Facebook Instagram

© 2024 Blueprint Technologies, LLC.
2600 116th Avenue Northeast, First Floor
Bellevue, WA 98004

All rights reserved.

Media Kit

Employer Health Plan

Privacy Notice