AI Decision Series | Part 1: Open-Source versus Closed-Source Models

Dan Leszkowicz

This series explores the decisions that an organization will need to make as it invests in AI and LLMs. Each entry delves into a key decision along that journey, and the tradeoffs and implications of each option. Where appropriate, we’ll offer Pienso’s perspective and recommendation.

Large Language Models (LLMs) are immensely powerful, and their potential to be harnessed by enterprises is exciting. But this potential comes with a slew of decisions and trade-offs. Key among those decisions is the choice of whether to use a closed-source model or an open-source model. In this article, we’ll explore the relevant considerations. But first, a few definitions.

Open-source LLMs are language models that are accessible and usable by the general public. In other words, their use is free and available to anyone with the appropriate skill sets. These models can be deployed straight out-of-the-box, or users can modify them. Think of the model like a cooking recipe: you could follow it exactly, or substitute in your favorite herbs and spices to suit your tastes. 

Closed-source LLMs are language models that are private and proprietary. These models are typically produced by tech companies or organizations for commercial purposes, generally accessed via API for a fee. These proprietary LLMs can only be edited or refined by the authoring organization, not the downstream customers. Typically (more on this later) that means that closed-source models are less of a recipe and more of an already-cooked entree served at a restaurant.

So which one is better?

Or, more pragmatically – what tradeoffs should you consider when picking an approach?

Customization

The clearest difference between open- and closed-source LLMs is the varying level of customizability offered between the two. As noted, open models’ code — including their weights, biases, and other parameters — is all accessible to the general public. Some examples include Meta’s Llama 2, BigScience’s BLOOM and Google’s BERT. And those components, particularly weights, can be manipulated via a process called fine-tuning, in which you would expose the model to a new set of data — generally data that is relevant to your organization. During that process, the model makes predictions based on snippets of your data, measures its performance against the ground truths present in your data, and then adjusts its weights until its predictions match those expected ground truths. By the end of this process, you have a modified version of the LLM that better speaks the language of your data.

Closed models’ code is, as the name suggests, closed off from public view. The weights and parameters are not made public, and consequently, users typically cannot adjust them via fine-tuning. So there is limited ability to take an out-of-the-box model and align it with your data and business needs. There are exceptions to this, however. For example, OpenAI and Cohere do allow for fine-tuning of some of their models, though the adjustments of weights occur behind a curtain. That is, you input your data into a process that remains opaque; adjustments are made to the weights inside this system, and the specifics of those changes are not transparent. The end result is a new closed-source API to which you can point new data, though you won’t be able to observe exactly how the API has changed Whether that obfuscation is acceptable, is dependent on your business needs and to what degree you may be asked to explain the model’s results.

All else being equal, a fine-tuned model will almost always outperform a general model. It is akin to seeing a medical specialist rather than a general practitioner. Depending upon your use case, a fine-tuned closed-source model may suffice, but it requires being comfortable with the accompanying lack of transparency.

Winner: Open Source

Usability

Assuming you’re working with the “raw” form of a model (i.e., one that hasn’t been fine-tuned), then open models do require more know-how and resources. As a basic example, you need the compute resources (virtual or physical) to be able to host and run the LLM; and you need the skill set to manage and eventually optimize the same, whereas a closed API handles the bulk of those considerations for the user. To work with a closed API, you’ll just need technical folks who are comfortable with API requests and responses.

Much of this topic also hinges on how the LLM is being used. Rudimentary tasks (e.g., classifying a single document) likely wouldn’t require ML resources. But when you eventually graduate to sophisticated pipelines consisting of multiple tasks (e.g., classifying millions of documents, then summarizing each category, then identifying hidden trends within each category and the degree to which each is present, etc.), then ML proficiency starts to become necessary.

That all said, another approach is to avoid using “raw” models, and instead use a platform that taps into those models, but manages many of those technical considerations and tasks on your behalf. These platforms run the spectrum from no-code to low-code to pro-code. But if your strategy is to use such a platform, and we hold factors such as performance (more on that below) constant, then usability starts to look very similar between open and closed options.

Winner: It Depends

Size, Power & Accuracy

The past year has seen an arms race in the LLM ecosystem, with increasingly large and powerful models being released in quick succession. This pursuit of larger models — measured both by the number of parameters and the amount of training data — has been driven primarily by technology companies that are producing closed models. As a result, closed raw LLMs tend to outperform open raw LLMs in terms of accuracy and robustness.

That said, there is a point of diminishing returns. You could ask GPT-4 to write a marketing email, create a recipe from the ingredients in your fridge, or compose a Chaucerian poem about your dog. But unless you have the most interesting job in the world, you’re likely to only regularly use one such capability in the course of your work. Is incurring the significant costs of accessing a closed LLM worth it, if you’re really only interested in a model for specific business tasks? (More detail on these costs below.) That question is particularly salient if you have the opportunity to fine-tune a model for that specific task, and achieve giant-LLM-like results, at a lower cost.

Closed models tend to be the most powerful and versatile. It’s worth asking, however, whether that versatility is really necessary — or whether a fine-tuned, task-specific model might be more effective and cost-efficient.

Winner: Closed Source


Cost Efficiency

On the surface, the cost question should be an easy one. Open-source models are free to use, closed-source models incur a fee. But there are hidden costs that make each option more expensive than they first appear.

While you can access an open-source model without paying a fee, you’ll need to provide and pay for the ancillary costs that come with hosting an LLM yourself. Chief among these is the advanced computing infrastructure (such as GPUs or IPUs) required to run these massive models. Some level of platform engineering will be required to tune these resources for performance, with standards varying from provider to provider. And last but not least are the support, maintenance, and security costs required to maintain any enterprise application.

In contrast, all of these costs are baked into the fee you pay to access a closed model via API. But this “packaging,” while helpful, may not provide as turnkey a solution as you may expect – and therein lies the rub. A rough initial estimate of closed model costs may be calculated by taking the cost per token (the basic unit of text that an LLM processes), then multiplying it by the number of tokens in the data set you want to analyze. But that assumes a few things: that your prompt is perfectly constructed; that your dataset is of a manageable size and won’t time out the API; that you’ve set the parameters optimally, and so on. Should any one of these assumptions prove to be untrue, then you’ll need to try again. And again. Each of those experiments incurs a fee, and when dealing with enterprise-sized datasets, those fees add up quickly.

Iteration and experimentation is critical to any LLM efforts, particularly as the technology develops at such a rapid pace. It’s important to select a model and platform that will allow and encourage that experimentation, rather than make it cost prohibitive.

Winner: Open Source

Performance

Our analysis to this point relies on the assumption that both options are viable and performant. But each organization’s use case for LLMs is different, with vastly varied amounts of data being analyzed, as well as different speed requirements for results.

Generally, the larger an LLM, the longer inference will take. But holding LLM size constant, we’re still subject to the API constraints. In leveraging an open model, those constraints are under your control. In a closed model, those constraints (calls per minute, tokens per call, response time, etc.) are controlled by the LLM provider — and they may or may not meet the needs of your given tasks.

Winner: Open Source

Data Privacy

To use an LLM is to send your data to it – wherever it is hosted. If you’re hosting an open model yourself, that typically means that your data isn’t going very far. Most likely, you’re hosting the model in the same tenant or data center as where you host your enterprise data. That means you’re able to maintain your tried-and-true data security posture. Whereas if you’re using a closed model, you’re almost certainly transmitting your data outside of your preferred environment — with all the increased vulnerability that entails.

Additionally, each closed model provider has their own set of terms, conditions, and privacy policies that dictate how they treat your data. Some of those policies may allow the provider to use your data to further train their models. In that case, you’re effectively donating your private and proprietary data to enrich a model that will then be used by other organizations, even your competitors. Given the preciousness of your enterprise data (it’s a representation of your business!), that’s probably too high a cost.

Winner: Open Source

Model Autonomy & Control

Once you’ve deployed an LLM to some inference task – let’s say analyzing customer service conversations as they roll into your contact center – it’s important that you be able to rely on that LLM. So far, we’ve largely focused on the ability to rely on the accuracy of a model. But we also need to assess the precision of a model, specifically precision over time. And that precision depends on building from a foundation LLM that is stable, rather than in constant flux. You can imagine the difficulty of tracking customer satisfaction over time if the underlying model is changing every few weeks.

If using an open model, you have complete control over said model. No updates are made unless you explicitly choose to accept them. Whatever version of an LLM you build from isn’t going anywhere without your say.

If using a closed model via an API managed by a third party, then you’re at the mercy of their changes. And those changes are a bit more frequent and unpredictable than you might expect. Take GPT-4, for instance, which was released in March 2023. Since its name hasn’t changed since then, you might assume that the model hasn’t either. But take a look at the OpenAI community forums and you’ll find plenty of frustration from developers who regularly use and test the model and have experienced serious variance in results. A large part of their frustration stems from the fact that these minor releases, or “checkpoints” as OpenAI calls them, are not communicated in advance or documented in detail, and users are unable opt out or even schedule their adoption (the way one typically expects enterprise software upgrades to be managed).

Winner: Open Source


Conclusion

We at Pienso fervently believe that open-source models will offer advantages over closed-source models, particularly as use cases become more complex and as users become more sophisticated. For that reason, we suggest that organizations invest in the tools, infrastructure, and knowledge whose value will outlast this initial LLM hype cycle. That said, the most important thing right now is to learn and experiment. If your organization is presently only able to leverage LLMs if done via the packaging provided by closed-source models, then that’s a fine way to start. But be wary of building too much on top of a model whose updates you can’t control. And keep assessing how you could take advantage of the strengths and efficiencies of the open-source options, including exploring platforms that handle some of the heavy lifting for you. Your enterprise’s data and any resulting fine-tuned LLMs you generate from it will quickly become lasting and differentiating assets for your organization. Having your own LLM regime is fundamental to that competitive advantage.