Snowflake goes to great lengths to promote its closed-source software and may be trying to convince an audience that isn’t really interested in it. […]
Disclaimer: Matt Asay works for AWS, but the views expressed here are his own and do not reflect those of his employer.
In two recent blog posts (“Striking a balance with ‘open’ at Snowflake” and (“Where open helps and where it hurts”), Snowflake has spent 6,064 words arguing a very simple concept: Not all software needs to be open-open sources, open standards, open APIs. It is not a particularly reprehensible argument and reflects the reality that while virtually all software contains open source code, most software is not licensed as open source. In other words, Snowflake is safe in the right to keep its software closed.
And yet, the company clearly felt the need to justify its decision (twice), reflecting the strong appeal of open source, open standards, and open APIs, even when customers don’t seem to be asking for it.
Open sourcing of data
Nearly a decade ago, Cloudera co-founder Mike Olson made a bold statement: “No dominant platform-level software infrastructure has emerged in closed-source and proprietary form in the last 10 years.”Olson was mostly right. Splunk had emerged during this time and maybe a few other examples, but the bottom line was that he was right.
Fast-forward to 2021, and Olson’s pronouncement has remained pretty accurate, with a few exceptions. Snowflake is one of them. The company, which describes itself as a data cloud company, has managed to build a big business with a proprietary SaaS offering in an industry flooded with exceptional open source data infrastructures such as Apache Hadoop, Apache Arrow, Apache Spark and others.
This may reflect a more nuanced reality: companies may intuitively want ” open, “but they place a greater emphasis on”working.” This has been evident for years as companies have introduced managed services to facilitate the use of open source software, or, in the case of companies such as Fauna and Snowflake, offer managed services that are not based on open source at all.
Combining both” open source “and” easy to use ” into one service is the holy grail, but if companies have to choose one, they will choose the solution that is easiest for them. After all, a customer can turn to Apache Spark, Dremio, or any number of tools to create data warehouses or data lakes, yet thousands of customers have spent around half a billion dollars on Snowflake in the last year.
So why does Snowflake defend a position that seems to please its customers?
That’s a lot of words
Between the two posts, Snowflake put in a lot of effort (3,798 words in the Snowflake blog and 2,266 in the InfoWorld post) to say, “We don’t think everything should be Open.”That’s a lot of digital ink that’s been wasted trying to disguise a clear and perfectly acceptable message that pretty much every vendor on the planet agrees with. In the InfoWorld blog, for example, the company extols the excellent contributions of its employees to the open source database FoundationDB, which the company uses in its infrastructure. Great!
But then this statement is followed by an embarrassing addition: “However, we do not deduce from this that open source software has an inherent advantage.”The authors then double down on the argument that” Open is not a panacea. We strive to avoid misguided applications of Open that create costly complexity instead of cost-effective usability.“
The company simply intends (and ultimately says) that open source is a means, not the end. That’s true. But along the way, it also makes erroneous claims about open source by suggesting that it would somehow diminish the company’s ability to secure its software, which is simply not true. “At Snowflake, we believe in the value of open standards and open source, but also the value of data governance and security,” the company’s co-founder says on the InfoWorld blog. This” but ” is completely unnecessary and implies that open standards and open source undermine data governance and security. Neither is true.
It is also the false premise that source code must be useful to everyone in order to be useful at all. On the company blog, the authors say: “The query processor of a sophisticated data platform is typically built by dozens of graduates of a PhD program, developed, refined and optimized over years. The availability of source code does not necessarily increase the ability to understand its inner workings.“
Michael Fischer, a container expert at AWS, picks this up: “Open source was not about allowing users to understand and improve the software. It’s about enabling the world to do this. Just because relatively few people are able to understand or patch the code of the Linux kernel does not mean that its openness has had little impact. It is a little complacent and insulting to suggest that you should not share, because only graduate students would understand. In fact, science is making progress through sharing and publishing. This is the purpose of scientific journals and conferences. Art advances through disclosure.“
Fischer is right, but of course there is no law that requires Snowflake to, or even should, disclose its code, file formats, or anything else. Dave McCrory, VP of growth and global Head of insights and analytics at Digital Realty and a longtime cloud and open source observer, points out, “Not all software must or should be open source. Open source is a suitable license / model for a lot of software, but not for everyone.“
Whether Snowflake should do that is ultimately a decision for its customers, and based on sales, it seems that Snowflake’s customers don’t care. So once again, why write the posts?
Sell after closing time
Most of Snowflake’s major competitors also offer their own data clouds / platform services. (Disclosure: I work for AWS, a partner and competitor of Snowflake, but I am not involved in this part of the business). For example, it is highly unlikely that Oracle vendors will finish Snowflake because it offers proprietary software. Maybe the pressure comes from Databricks or other open source providers?
Databricks has recently launched its Delta sharing project, an open protocol for the secure exchange of large data sets in real time. This was just one of the announcements made by Databricks at the Data + AI Summit, which carried the slogan “The future is open”. Databricks is also not alone in positioning its data cloud as an open alternative to solutions such as Snowflake. Journalist Sean Kerner told me, “You should see my inbox … Every other pitch is’ X is an open alternative to Snowflake.‘ „
Snowflake, for its part, insists that open is not the right answer when it comes to file formats, source code, and more. At least not always. Maybe it’s right. But writing thousands of words to argue against Open instead of simply demonstrating value to customers through its offerings is bad marketing. How I Wrote about the Snowflake IPO in 2020:
“Developers have never been overly convinced of open source. The reason for [Olsons Bemerkung über einen] “mind-boggling” trend is simply that open source has made it easier for developers to get their work done, thanks to a high-quality, easily accessible, open-source data infrastructure. There are, of course, other benefits, such as the communities that often come with open source projects, coupled with the desire to have more granular control over your own software stack. But in the end, open source prevailed because it allows developers to “– get it done.” That’s also why there are developers who like to use open source software like Apache Airflow to load data into their proprietary Snowflake data platform. This is not cognitive dissonance. It’s pragmatism.“
By streamlining its decisions rather than simply delivering value to customers, Snowflake ends up confusing more than it creates clarity. Companies clearly appreciate what it sells. There is no reason to apologize for not being open enough.
* Matt Asay writes for InfoWorld.com.