BACK TO BLOG

Open source at HASH

How our approach to open source informs what we do

May 15th, 2022

David WilkinsonCEO, HASH

Working in public

At HASH, we publish as much of our code as possible in public.

When it comes to the HASH workspace and simulation products, once code has been written, we generally aim to release it in one of four different ways. Each of these carries a different set of terms governing what can and can't be done with it, and is outlined below:

  1. Open-source and permissive licenses: in essence these licenses give people the freedom to do pretty much whatever they like, including extending the code and keeping it private (making this the most flexible category of license). The entirety of the Block Protocol is available under these terms (by way of the MIT License), alongside most of our blocks and connectors. Some of our packages like error-stack and deer are released under multiple permissive licenses (i.e. incorporable under both MIT and Apache 2.0 licenses, at your option).
  2. Open-source and copyleft licenses: these again give people the freedom to use the code however they like, provided that if they extend or modify the code they contribute those changes back into the open source as well under the same terms. Our HASH application is primarily available under the AGPL which guarantees this freedom.
  3. Fair-source licenses: fair-source licenses give people access to and freedom to use code directly, and the ability to self-host products without reliance on their creators (i.e. us). However, fair-source licenses limit the ability of others to create competing commercial offerings based on this code. HASH Engine is released under the Elastic License (ELv2) which allows anyone to use, copy, distribute, make available, and prepare derivative works. But the Elastic license prevents third-parties from using software licensed under it to provide a managed service to others (e.g. to provide a cloud simulation SaaS offering).
  4. Public-source: in a limited number of circumstances, we envisage in the future writing code that is not (at least initially) published under a fair or open-source license. However, wherever possible, we'll still publish this code publicly. Following a pattern established by GitLab, public-only source will reside in a special, segregated folder (in their case named ee, which stands for "Enterprise Edition"). In our case, too, we are likely to limit code released in this manner to that which is only really useful to large enterprises (e.g. a solution for federated authentication and authorization at scale, such as SAML). At the time of writing, we haven't yet released any code like this, and will be intentionally strict in the event we do (updating this post with a linked example then).

From time-to-time, we may keep code private. In connection with our workspace and simulation products, there are currently only a few instances of this:

  1. The first comprises our deprecated and legacy code, which we have already or are currently fully replacing with equivalent public code. It's unlikely we'll make this public, except perhaps as a curio in distant the future.
  2. The second relates to products within our platform which we are now preparing to release under open-source or fair-source licenses. By this we primarily mean the infrastructure and code behind our cloud compute service for simulations which will be made available to run alongside our open-source HASH application, as well as our HASH Core IDE, which will eventually be published under the Elastic License (as HASH Engine already is).
  3. The third concerns certain code that relates to specific aspects of account management, security, billing, and other sensitive internal processes.
  4. The final category of private code refers to that which we have assisted individual users in creating, or have been contracted to write, but which we do not have permission to share independently. In some cases customers have published the outputs publicly on HASH themselves, but this may not always be the case.

Non-HASH branded products and services may be released under slightly different terms, but in general we will endeavor to hew to the above and below principles as closely as possible, as we have not only with HASH, but the Block Protocol as well.

How we think about open source

First, we contribute original open-source software (OSS) into the ecosystem through our HASH platform, its connectors, and our blocks. You can find these on GitHub.

Second, we leverage existing OSS within our platform to accelerate development of our products and to provide users with best-in-class experiences. Under the hood we rely on great technologies such as OpenSearch (Apache), Snowplow (Apache), React (MIT), Apollo Server (MIT), and GitLab (MIT). In addition, we use dozens of other open-source packages within our extended codebase.

Third, we publish a lot of the source code for the non-open-source components of our platform (such as HASH Engine) publicly under fair-source licenses like the Elastic License. This allows any HASH user to utilize our simulation technology within their organization for internal use, free from restriction, provided they don’t offer this technology to others as a managed service themselves (as that would be competitive with our core business). In addition, the Elastic License forbids users from circumventing any license key functionality, or removing or obscuring any feature protected by those keys, as well as any included licenses, copyright and other legal notices.

Fourth, we default to openness. Unless there is a real reason to release something under a more restrictive license, we start from a position of assuming all new code written will be open-source and permissively licensed.

Similarly, and finally, we regularly evaluate the competitive landscape to see if we can move existing private code into public repositories, and/or relicense existing public fair source as open source. When the environment changes, so do we. In keeping with our mission to continuously reduce and eliminate information failures, our ultimate goal is to have a fully sustainable business with as large an open-source footprint as possible. We know we can’t even come close to conceiving of all the ways our technology might eventually be used by others, and it’s important to us to provide future innovators with as much freedom and flexibility as possible — without compromising our own ability to sustainably pursue HASH’s mission over the long-term.

Our open-source strategy stands in contrast to the approach of others within the structured-knowledge and simulation space who argue that open-sourcing their software would reduce its quality (see: Why Wolfram Tech Isn’t Open Source—A Dozen Reasons). We respectfully believe the opposite to be true.

What Wolfram gets wrong

Wolfram are one of the oldest players in the structured knowledge space. Their mission, "to make it possible to compute whatever can be computed, whenever and wherever it is needed, and to make accessible the full frontiers of the computational universe" is highly aligned with our own.

However, all twelve of Wolfram's stated criticisms of open source (linked above) are straw-men or falsifiable.

  1. Strong leadership and design are compatible with open source. Wolfram's first argument against open source is that "a coherent vision requires centralized design". This claim disregards the very many OSS projects with benevelolent dictators who impose design standards, constraints and their own ideas upon the evolution of their projects, much akin to how any company management might - as well as the many corporate stewards of OSS projects. It follows that their second claim, that "high-level languages need more design than low-level languages", is also not an argument against OSS. "Crowd-sourced decisions can be bad for you"; "Unified computation" and "Unified representation" requiring "unified design"; and "Bad design is expensive" — arguments 5, 7, 8 and 12 respectively — can be dismissed for the same reason. None of these are arguments against open source so much as they are arguments for strong leadership, which remains eminently possible within an open-source environment.
  2. Open-source communities are more diverse, not less. Wolfram argue that "you need multidisciplinary teams to unify disparate fields" (3), a statement that is undoubtedly true. Yet claiming this as an argument against open source turns a blind eye to the diversity and breadth of expertise found within many OSS projects, including HASH, whose contributors come from multiple continents, have varied backgrounds, and in fact represent a much wider set of life-experiences and disciplines than a single company alone might hope to capture within its organizational structure.
  3. Commercial entities can help open-source projects be more effective. Wolfram argue that "hard cases and boring stuff need to get done too" (4), essentially implying that open-source maintainers and coders can't be trusted to knuckle down and focus on important maintainability, or boring background work required to make new technologies impactful and successful. Not only is this patronizing to the thousands of open-source developers who care passionately about the products they build, but it ignores the possibility of commercial entities advancing OSS projects and acting with similar motivations to any other private company. "Our developers work for you, not just themselves" (6) relies upon the same misrepresentation, while "Paid software offers an open quid pro quo" (10) mischaracterizes and ignores the various business models that allow OSS to be both free at the point of access, and profitable (e.g. through the provision of paid complements, such as the famous support services provided by Red Hat which justified their $34 billion sale to IBM in 2019).
  4. Huge technological advances occur in the open. Wolfram's ninth claim, that "Open source doesn’t bring major tech innovation to market", can be disproven by a thousand major open projects, and the entirety of the modern web is built atop OSS. To pick one such project, Linux powers 100% of the world’s top 500 supercomputers, 96.3% of the top 1 million web servers, and 23 of the top 25 websites in the world – built on open Git repositories now containing some ~28 million lines of code.
  5. There are many long-term financing options and sustainable business models available to open-source projects. According to Wolfram, "it takes steady income to sustain long-term R&D", which is generally true (unless you happen to be sat on a large nest-egg already). However, a great number of OSS projects are able to command a stable income, and many of them are far more profitable than Wolfram. Since the advent of debt in approximately 3500 BC it's been possible to operate without monetizing upfront -- in promise or expectation of future cash-flows -- and modern financing in the form of venture capital is just one form of this that has led to an explosion in OSS companies.

Companies often find themselves constrained by path-dependency, struggling to justify historical choices. It can be hard to switch to an open business model when you already have paying customers in a closed business model, so we've set out from day one at HASH to be completely open.

Join us... we're hiring!

Stay up to date with HASH news

Subscribe to our mailing list to get our monthly newsletter – you’ll be first to hear about partnership opportunities, new releases, and product updates