The 4 Data Startups That Prove Open Source is the Key to Success

Discover how these 4 innovative startups harness open source to redefine data solutions and achieve success.

3 min read

My FOMO for crazy evaluations for startups in the data world drove me nuts.

There has been an insane amount of VC excitement and hype in the data platform and tooling startups. We have been inundated with funding announcements every few weeks.

So I spent time studying their strategy and business models.

The common pattern:

"To build a startup in Data/ML start with open source and sell as SaaS/PaaS."

I curated 4 startups stories, to learn from.👇

Confluent: Infra as a Service for leveraging real-time data

They were born inside LinkedIn and later given to Apache foundation;

Finally "IPO" ing at 30 Billion

Real-time data processing has been a hot topic since the early days of Big Data. Kafka is the king of real-time data analytics stack – Apache Kafka + ksqlDB + ClickHouse (or Druid, etc) + Superset (or Looker, Mode, etc.).

This is a classic case of smart engineers who work at a big tech company and are confronted with technical problems that the rest of the world has not experienced yet. They solve it by building Kafka and later Confluent

The core technology is open source but managing it at scale is the hard part.

DBT: SaaS on top of open core SQL based data transformation tool

Founded in 2016; Raised $190+M to help Orgs unlock value in their data.

They built the product in the open and got rapid growth to 5k practitioners using it at 1.7k companies. That's when they raised money on the community + OSS product they had built. Their model is an open dbt CLI + compiler and SaaS offering that makes it easy to start using the product with web IDE, simple orchestration tool, and more.

DBT was born out of the need to scale # of projects at Fishtown Analytics.

Prefect: Freemium cloud hosting for OSS Orchestration Engine

Growing quickly as a new workflow management standard; Raised $57.6M.

Airflow has been historically the go-to tool for workflow orchestration. Prefect was born out of the lack of some key features in Airflow like dynamic dags and fast-moving tasks. They have an open core that can be used to write complex dags but they kept orchestration and interactive UI as a premium feature.

Preset is fighting an uphill battle with Looker, Mode, etc with deep pockets.

Airbyte: Freemium cloud hosting for OSS data integration tool

They are still pretty early in the game, born in 2020; Raised a whopping $180M till now.

The founders organically stumbled on the problem of data integration while ideating for their next startup idea. They have an open-source core to write connectors to move data from source to destination. They provide a fully managed solution for teams that don't want to maintain their own infra.

Lesson: If you are building a platform for engineers, use OSS in your strategy to grow organically with a strong customer retention

I hope if you are someone motivated to learn how to launch your open source side hustle into the next big unicorn, these are helpful starting points to launch yourself.