How VC-Backed Data Startups Are Using Open Source to Build Their Success

Here's why everyone is talking about how open source is revolutionizing VC-backed data startups...

2 min read

My FOMO for crazy evaluations drove me nuts.

So I spent some time studying their strategy and business models.

There has been an insane amount of VC excitement and hype in the data platform and tooling startups. We have been inundated with funding announcements every few weeks.

The common pattern:

"To build a startup in Data/ML start with open source and sell as SaaS/PaaS."

I curated 4 startups stories, to learn from.👇

Databricks: SaaS platform to work with Apache Spark + MLFlow + Delta Lake.

They are one of the largest startups ($3.6B funding) built on an OSS core.

They provide an easy-to-use interface and hide the hard parts of setting up and managing a data platform on any cloud. They let teams test their offerings using open-source libraries. Once the team depends on Spark for mission-critical jobs, they buy performance and support with the enterprise edition.

The core components are all open-sourced.

Tecton: Hosted SaaS feature store for machine learning.

Founded in 2019 they have raised $60M trying to operationalize ML.

The concept was born out of Uber ML Team. They started with a commercial platform to build reusable stores for features that can be used to train models. Later they joined the core contributing team of FEAST, an OSS feature store.

They have aligned their vision with Feast to increase their market share.

Preset: Managed cloud services for Apache Superset.

Growing quickly as a data exploration and visualization tool; Raised $48.4M.

Born out of Airbnb, open-source core with education, training, and fully-hosted hassle-free deployment services. It was created by the creator of Airflow, another very successful OSS data project.

Preset is fighting an uphill battle with Looker, Mode, etc with deep pockets.

Rudderstack: Freemium end-to-end customer data platform

Their core business value is "software should be open and developer-focused".

In a very fragmented market, where various tools are trying to do one thing well, they are trying to be 1 platform to easily capture and activate customer data for companies.

They are still pretty early in the game, born in 2020; Raised $21M in 2021.

Lesson: If you are building a platform for engineers, use OSS in your strategy to grow organically.

I hope if you are someone motivated to learn how to launch your open source side hustle into the next big unicorn, these are helpful starting points to launch yourself.

There are more (confluent, prefect, dbt) I want to dive into but maybe later..