When ChatGPT was released in November 2022, the questions we asked and prompts we entered were simple: "Tell me a story about X" and "Write a narrative between person A and person B on topic Z." Through these questions and initial interactions with GPT-3.5 at the time, we were trying to identify how this new, trending technology would impact our day-to-day lives. Now in late 2024, AI complements our lives: helping us debug and write code, compile and summarize data, and drive in autonomous vehicles, to name just a few. These are outputs of a modern-day AI factory, and we are only at the beginning.
This article, the first in a series on AI factories, explores the components of an AI factory and how the different elements work together to generate AI-driven solutions at scale.
Amidst the AI evolution, the concept of an AI factory has emerged as an analogy for how AI models and services are created, refined, and deployed. Much like a traditional manufacturing plant that takes materials and transforms them into finished goods, an AI factory is a massive storage, networking, and computing investment serving high-volume, high-performance training and inference requirements.
Within these factories, networks of servers, graphics processing units (GPUs), data processing units (DPUs), and specialized hardware work in tandem to process vast amounts of data, executing complex algorithms that train AI models to achieve high levels of accuracy and efficiency. These infrastructures are meticulously designed to handle the immense computational power required for training large-scale models and deploying them for real-time inference. They incorporate advanced storage solutions to manage and retrieve massive datasets, ensuring seamless data flow.
Load balancing and network optimization maximize performance and resource utilization, preventing bottlenecks and ensuring scalability. This orchestration of hardware and software components allows AI factories to produce cutting-edge AI models and continuously refine them, adapting to new data and evolving requirements. Ultimately, an AI factory embodies the industrialization of AI development, providing the robust infrastructure needed to support the next generation of intelligent applications.
As NVIDIA CEO Jensen Huang said at Salesforce Dreamforce 2024, “In no time in history has computer technology moved faster than Moore’s law,” continuing, “We’re moving way faster than Moore’s law and are arguably easily Moore’s law squared.”
Deploying AI at scale is becoming increasingly essential as AI investments serve as crucial market differentiators and drivers of operational efficiency. To achieve this, organizations need to continuously build and refine models and integrate knowledge repositories and real-time data. The AI factory concept highlights that AI should be an ongoing investment rather than a one-time effort. It provides a framework for organizations to operationalize their AI initiatives, making them more adaptable to changing business and market demands.
Drawing on our expertise helping customers deploy high-performing, secure modern application fleets at scale, F5 has developed an AI Reference Architecture Framework. Given that AI apps are the most modern of modern apps, heavily connected via APIs and highly distributed, this framework addresses the critical performance, security, and operational challenges essential for delivering cutting-edge AI applications.
F5's AI Reference Architecture Diagram
Within our reference architecture, we have defined seven AI building blocks needed to build out a comprehensive AI factory:
Outlines the interaction between a front-end application and an inference service API; centers on sending a request to an AI model and receiving a response. This sets the groundwork for more intricate interactions.
Enhances the basic Inference by adding large language model (LLM) orchestration and retrieval augmentation services. It details retrieving additional context from vector databases and content repositories, which is then used to generate a context-enriched response.
Focuses on the data ingest processes required for Inference with retrieval augmented generation (RAG). It includes data normalization, embedding, and populating vector databases, preparing content for RAG calls.
Aims to enhance an existing model's performance through interaction with the model. It adjusts the model without rebuilding it from scratch and emphasizes collecting data from Inference and Inference with RAG for fine-tuning workflows.
Involves constructing a new model from the ground up, although it may use previous checkpoints (re-training). It covers data collection, preprocessing, model selection, training method selection, training, and validation/testing. This iterative process aims to create robust models tailored to specific tasks.
Connects the LLM orchestration layer to external sources like databases and websites. It integrates external data into inference requests but does not include document preprocessing tasks such as chunking and embedding.
Encompasses workflows for developing, maintaining, configuring, testing, and deploying AI application components. It includes front-end applications, LLM orchestration, source control management, and CI/CD pipelines.
Together, these building blocks form the backbone of an AI factory. Each plays a crucial role in the creation, deployment, and refinement of AI outputs. In addition, AI factory initiatives tend to lend themselves to owning one’s implementation strategies (over leasing or out-sourcing them) for most of the building blocks, resulting in the selection of self-hosted out of the below-listed deployment models.
For each of these building blocks, customers must select an appropriate deployment model and implementation strategy (own, lease, or out-source), defining the optimal reference architecture for achieving the business objectives of their AI initiatives. Here are the top four:
The capabilities from F5 you rely on day to day for application delivery and security are the same capabilities critical for a well-designed AI factory. F5 BIG-IP Local Traffic Manager, paired with F5 rSeries and VELOS purpose-built hardware, enables high-performance data ingest for AI training. F5 Distributed Cloud Network Connect for secure multicloud networking connects disparate data locations, creating a secure conduit from proprietary data to AI models for RAG.
F5’s focus on AI doesn’t stop here—explore how F5 secures and delivers AI apps everywhere.
Interested in learning more about AI factories? Explore the other AI factory series blog posts F5 has published to date: