Why Databricks should acquire Hex?

Stas Sajin
5 min readAug 24, 2022

--

Databricks has amazing backend engineering talent and since its founding has set up a platform that is data practitioner first. All founders come from strong technical backgrounds that were involved in the creation of technologies like Apache Spark and Apache Mesos. Databricks uses open source technologies (DeltaLake, Spark, MLFlow) to power its platform, and its conferences have more deep dives, folks in t-shirts, and engineer speakers than many other conferences that I’ve seen. Yet despite all the technology and algorithmic breakthroughs, one thing I find that they don’t do well is building great experiences. Notebooks are a prime example of that.

Source

I judge a book by its cover

I’ve used Databricks since 2016 across two companies, so I have enough exposure to the product. Whenever I use a new framework, library, or platform I end up judging it by its cover, and I have to say that the notebooks within DataBricks need improvement. Here are some issues I’ve experienced:

  1. Interactive visualizations like Plotly or Altair are always not rendering exactly as you would like. DataBricks doesn’t make it very easy to customize the rendering of HTML blocks.
  2. Databricks supports KaTeX for displaying mathematical formulas and equations. I always struggle with KaTeX typesetting and find that I can’t just copy my LaTeX code from regular jupyter notebooks into Databricks without making changes.
  3. Installing libraries in a notebook with `!pip` command always feels weird to me. Yes, I know you can attach cluster libraries, but I would much rather have an experience similar to SaturnCloud where I can select the cluster size and the docker container environment in two separate steps. They desperately need to add some container registry feature to the product.
  4. The product evolves slowly and even simple features that you would expect to just exist, get introduced somewhat late. Consider the fact that tab autocompletion for Python was added in Dec, 2020, many years after Databricks released notebooks. If you work in R and you’re used to RStudio markdown notebooks, the UX is just so lacking that it probably pushes you to not use Databricks at all.
  5. From a design perspective, I also find that there is just too much whitespace and that the overall experience does not gel. For example, when tabs were introduced into the product, I noticed that the default was set to two tabs, yet the practice for languages like python is four.

I probably have a dozen other issues to list. Databricks has by far the least pleasant notebook experience on the market when compared to open source Jupyter Notebooks, RStudio Cloud, DeepNote, Sagemaker, Kaggle, Collab, and Notable. Unfortunately, if the front-facing products are not designed well, it prevents consumers from enjoying the backend engineering marvels Databricks has created. This is where I think Hex comes in.

Edit: Since I wrote this post, DataBricks has reached out to me about their roadmap, and they plan to execute aggressively to improve the Notebooks experience.

Hex

Hex is a platform for super-powered notebooks. Although I can list all the features they have, I think the product introduction video does an excellent job on its own.

As you can see, Hex does an excellent job of elevating notebooks and collaboration. It has extensive integrations and app-building capabilities and provides you with a workflow lineage that makes it easy to understand how cells tie together.

Reasons to support an acquisition

There are a lot of reasons why DataBricks, Hex, and consumers would benefit from this bundling. Companies benefit from product moats where the sum of their parts provides a greater competitive advantage than individual product offerings. Benn Stancil hits the nail on the head with this one. Microsoft is not known in the market to be the top technology leader in any area, yet because it has a large moat of products, customers find that type of bundling very convenient. Acquiring Hex allows Databricks to build a bigger moat and stay more resilient.

Folks talk about the Modern Data Stack, but I think it is very hard for companies to stomach adding another vendor to their ecosystem. There are just a lot of risks and overhead involved. Hex might find it difficult to enter the market as a latecomer and acquire a share of its competitors even if it has a superior product. For example, I would probably continue using DataBricks, Sagemaker, Colab, or local notebooks. As inconvenient as some of those offering are, it’s better to consolidate technologies and platforms instead of managing the overhead of having a new vendor.

Hex has raised about $70M+ and seems to have about ~50 people on its payroll. The best case scenario is that they have 3–5 years of runway and the average case is 1.5–2 years. They could probably extend that only if they execute with an aggressive sales strategy. In the current recessionary environment, they will have a harder time raising money during later stages without being dilutive. If Snowflake was willing to pay $800M+ for Streamlit, I can see Databricks making a similarly strong offer for a product that is more feature rich than Streamlit. Moreover, Hex is not the only product on the market. Deepnote, Notable, Sagemaker, and Google’s offerings are all very competitive. For both Hex and Databricks, the acquisition can be very attractive.

We can build our own

Databricks has great engineers and it would not surprise me if discussions internally lean towards building a completely new notebook platform. Unfortunately, the curse of being an engineer in the top quartile, is that sometimes you don’t have the perspective or empathy to design frameworks that are simple to use and understand for average users or non-engineers. The UX critique I have about notebooks is something I have about other parts of the platform, including Spark, MLFlow, Workflows, and even DeltaLake. For example, in the early days, Databricks made a bet that it would mostly be data engineers writing data pipelines. In practice, we found that SQL interfaces are more flexible and more accessible and that having a team of engineers writing Scala Spark or PySpark code is not sustainable. For Databricks to get a perspective on building great experiences they would have to not just acquire Hex, but the overall Hex team, and make sure that Design and Product have more leverage at the table.

--

--