Good Data Engineer vs. Bad Data Engineer: What Sets Them Apart?

Here's why everyone is talking about the critical traits that separate good data engineers from bad ones...

2 min read

This is an inspiration from Good PM/Bad PM at a16z blog.

I have been building advanced analytics solutions in data teams for over 5 years. I successfully built a data-driven culture at SnapTravel and went from being the first data person to a team of 20. I have seen many flavors of org structure and business problems while consulting with McKinsey.

 Here is my take on what makes a good data engineer (DE) vs a bad one:

  • Good DE is a team-oriented individual who uses their knowledge of software and data analytics to be the bridge between various disciplines. Bad DE is a side node that is easily ignored as a supporting function for getting data by the rest of the team.

  • Good DE creates tools to enable the whole company to use their data easily and efficiently. Bad DE focuses on getting data warehouse or data lake ready.

  • Good DE uses code versioning and peer review for their work. Bad DE uses UI based tooling and always think they are the specialist who is always right.

  • Good DE knows when to write ETL and when to stay away from manual plumbing work. They are obsessed with finding an efficient solution to the business problem. Bad DE are heavily biased towards writing code instead of solving the business problem effectively.

  • Good DEs have a product mindset on data assets. They treat their data models as API for downstream users. They actively seek feedback from analysts and scientists. They document their data models and test the data they deliver. Bad DE focuses on moving tickets of getting data to the requested table and views.

  • Good DEs understand the effectiveness and limitations of Python, SQL, or Java. They understand O(n) complexity and use vectorized operations as much as possible. Bad DEs hold strong opinions about their tool and don't build solutions using the first principles of computer science.

  • Good DEs always invest in continual learning and being a generalist. Bad DEs are territorial about their work and stick to what they know best.

  • Good DEs think about Idempotency, Stateful vs Stateless processing, Batch vs Streaming solutions. Bad DEs do not think that's necessary.