Home » Business Topics » Data Strategist

Databricks open sourcing delta lake is good news for AI 

  • ajitjaokar 
data-lake-20

Last week, Databricks open sourced all of Delta Lake (Delta Lake 2.0) to the Linux Foundation.  There is also a new release of MLflow (MLflow 2.0), which is a machine learning operations platform for management of ML pipelines. 

In Databricks parlance, a Delta Lake represents a data architecture that has both storage and analytics capabilities;  Data lakes  store data in native format and a Data warehouse stores data in structured format (typically SQL). Hence, a delta lake is expected to be ‘one system – one copy’ encapsulating both analytics and data in a single system. 

Analysis

  • Future revenues for Databricks are expected to be from packages for verticals such as retail, financial services and healthcare. 
  • The MLflow announcement tends to low code systems – which is a general trend
  • This is now a crowded space with competitors like Snowflake; Existing open source(Apache Iceberg); Cloud (Google Cloud, AWS, Azure); ERP(Oracle and SAP) and companies like Cloudera and HPE.
  • On the analytics side, Amazon Sagemaker, Azure Machine Learning, Google Cloud AI, Datarobot, H2O.ai, Domino Data, Dataiku and others. 
  • Financial markets are low – and IPOs are not possible – so this announcement adds to the overall buzz and traction
  • Coming from a datawarehouse background in a previous life, the idea of a data warehouse/data lake / delta lake as a single source of truth is a consultant’s dream – but for a company, risks  many failures
  • Hence, this is good news for AI because companies can use the open source technology to build their own platforms.

Image source and announcement – https://databricks.com/blog/2022/06/30/open-sourcing-all-of-delta-lake.html