Last week, Databricks open sourced all of Delta Lake (Delta Lake 2.0) to the Linux Foundation. There is also a new release of MLflow (MLflow 2.0), which is a machine learning operations platform for management of ML pipelines.
In Databricks parlance, a Delta Lake represents a data architecture that has both storage and analytics capabilities; Data lakes store data in native format and a Data warehouse stores data in structured format (typically SQL). Hence, a delta lake is expected to be ‘one system – one copy’ encapsulating both analytics and data in a single system.
Analysis
- Future revenues for Databricks are expected to be from packages for verticals such as retail, financial services and healthcare.
- The MLflow announcement tends to low code systems – which is a general trend
- This is now a crowded space with competitors like Snowflake; Existing open source(Apache Iceberg); Cloud (Google Cloud, AWS, Azure); ERP(Oracle and SAP) and companies like Cloudera and HPE.
- On the analytics side, Amazon Sagemaker, Azure Machine Learning, Google Cloud AI, Datarobot, H2O.ai, Domino Data, Dataiku and others.
- Financial markets are low – and IPOs are not possible – so this announcement adds to the overall buzz and traction
- Coming from a datawarehouse background in a previous life, the idea of a data warehouse/data lake / delta lake as a single source of truth is a consultant’s dream – but for a company, risks many failures
- Hence, this is good news for AI because companies can use the open source technology to build their own platforms.
Image source and announcement – https://databricks.com/blog/2022/06/30/open-sourcing-all-of-delta-lake.html