BLOG | OFFICE OF THE CTO

Gen AI Apps: Moving from experiment to production

James Hendergart サムネール
James Hendergart
Published July 18, 2024

Promises, promises. Generative AI-enabled applications promise to increase human productivity, and therefore business profits. The path from experiment to production for many use cases remains frantic and perilous, just like the 1981 video game by Konami, Frogger. The frog must get home, first by crossing a busy street, then by crossing a fast-flowing river. The frog can’t sit still, but it can’t always hop forward because of the inherent danger.

Companies are the frog. The street and the river are dizzying floods of AI technology passing by. The safe spaces (open street and floating logs) are homegrown projects and vendor offerings. Cars, trucks, turtles, and open water are the ethical, legal, privacy, and accuracy dangers of generative AI. Home is the opposite side, where business gains skyrocket after putting generative AI applications into production.

The speed at which generative AI has taken off creates a frantic urgency inside companies. But the speed at which models advance can paralyze some organizations because the ground is moving out from under them so fast, they cannot orient themselves in the face of constant, rapid change. The secret to adopting generative AI is just like starting a game of Frogger—jump in!

Starting from the basic generative AI use cases, once companies decide which of them seem desirable, they can begin experimenting. Whether building an AI-enabled application or trying out third party offerings, or both, the initial trials will clarify exactly what the business needs. The key to success in this early stage is committing to a minimum version of the application that meets production requirements for safety and efficacy. This becomes the organization’s foundation. From there they can push forward, deciding which model advancements matter, and whether building versus buying is more prudent.

There are two “gotchas” that will inevitably slow down an organization trying to adopt generative AI: training models inhouse and private data repos. The pace of foundational model advancement is now measured in weeks and months, and downstream versions are updated daily. Just take a look at the HuggingFace model tracker  if you need proof. The page I just pulled while writing this article fills nearly one entire page with models updated less than 1 minute ago. One must scroll past the fold to find a model more than 2 minutes old!

If the data needed to answer a question put to AI is publicly available, chances are there is already a good variety of foundational models available via an API for use. On the other hand, if the data is not public, then a decision needs to be made about whether to send that data during inference or to build or obtain and deploy a model privately. Paying to access a model is definitely faster and very likely cheaper than building or licensing, hosting, and maintaining your own model in house. What’s your corporate superpower? If it is anything other than building or maintaining large language models (LLMs) then chances are you should buy, not build and host.

And for private data, the context windows and custom options for LLM use have matured so quickly that enterprise grade security and compliance, similar to what has already been accepted for cloud computing, checks the boxes for most companies. In fact, no training on your data was heard loud and clear by model providers in 2023. Using ChatGPT Enterprise as an example, it boasts SOC 2 Type 2 compliance, SAML SSO, encryption of data at rest and in transit with dedicated workspaces that support data retention and domain verification—not bad, huh? Well, for 92% of the Fortune 500 it seems plenty good enough.

Organizations who believe in the promises of generative AI do not need to be paralyzed by the inherent dangers or the daily LLM improvements. They can succeed by jumping in with eyes open. The initial cycle of experimentation brings confidence and the ability to prioritize which tasks benefit most under safe and ethical use. These should be rolled out across the organization in production mode, giving evaluators and implementors a breather, and the organization time to measure productivity gains, prioritize the next set of tasks, and to determine whether building versus buying access to LLMs makes more sense.