BLOG | OFFICE OF THE CTO

Add Simplicity, Security, and Performance to AI Inference with F5, Intel, and Dell

 Miniatura
Published May 21, 2024

Organizations seek to build new apps and workflows powered by AI. But operating them successfully can be tricky. Multiple AI frameworks and app environments create complexity for developers and security teams. They need a solution that makes rapid inference easier to build, run, and secure.

Simplify AI development and security

Intel’s OpenVINO™ toolkit is an open source toolkit that accelerates AI inference while offering a smaller footprint and a write-once, deploy-anywhere approach. It helps developers create scalable and efficient AI solutions with relatively few lines of code. Developers can use AI models trained using popular frameworks such as TensorFlow, PyTorch, ONNX, and more. With OpenVINO, developers first convert and can further optimize and compress models for faster responses. Now, the AI model is ready to be deployed by embedding the OpenVINO runtime into their application to make it AI capable. Developers can deploy their AI-infused application via a lightweight container in a data center, in the cloud, or at the edge on a variety of hardware architectures.

A developer may not want to host the model with the application or embedded in the application. The application’s model may need to be updated from time to time, and the application may need to run multiple models to deliver the features the application provides. OpenVINO has a solution with the OpenVINO model server, a software-defined high-performance system for serving models in a client-server architecture. Benefits of the OpenVINO model server include:

  1. Ease of Deployment: With its containerized architecture using Docker, deploying models with OpenVINO model server becomes more straightforward and scalable. It abstracts away the complexities of hardware configuration and dependencies.
  2. Scalability: OpenVINO model server can be deployed in a clustered environment to handle high inference loads and scale horizontally as needed. This scalability ensures that inference performance remains consistent even under heavy workloads.
  3. Remote Inference: OpenVINO model server supports remote inference, enabling clients to perform inference on models deployed on remote servers. This feature is useful for distributed applications or scenarios where inference needs to be performed on powerful servers while the client device has limited resources.
  4. Monitoring and Management: OpenVINO model server provides monitoring and management capabilities, allowing administrators to track inference performance, resource utilization, and manage deployed models effectively.

OpenVINO simplifies the optimization, deployment, and scale of AI models, but to run in production, they also need security. F5 NGINX Plus works as a reverse proxy, offering traffic management and protection for AI model servers. With high-availability configurations and active health checks, NGINX Plus can ensure requests from apps, workflows, or users reach an operational OpenVINO model server. It also enables the use of HTTPS and mTLS certificates to encrypt communications between the user application and model server without slowing performance.

When deployed on the same host server or virtual machine, NGINX Plus filters incoming traffic and monitors the health of the upstream containers. It also offers content caching to speed performance and reduce work for the model server. This combination provides efficient security, but NGINX Plus and the OpenVINO model servers may need to compete for resources when deployed on a single CPU. This can result in slowdowns or performance degradation.

Accelerate AI model performance

Because infrastructure services such as virtual switching, security, and storage can consume a significant number of CPU cycles, Intel developed the Intel® Infrastructure Processing Unit (Intel® IPU) that frees up CPU cores for improved application performance. Intel IPUs are programmable network devices that intelligently manage system-level resources by securely accelerating networking and storage infrastructure functions in a data center. They are compatible with the Dell PowerEdge R760 Server with Intel® Xeon® processors for performance and versatility for compute-intensive workloads. Integration with the Dell iDRAC integrated management controller provides closed-loop thermal control of the IPU.

Using an Intel IPU with a Dell PowerEdge R760 rack server can increase performance for both OpenVINO model servers and F5 NGINX Plus. Running NGINX Plus on the Intel IPU provides performance and scalability thanks to the Intel IPU’s hardware accelerators. This combination also leaves CPU resources available for the AI model servers.

Integrating an Intel IPU with NGINX Plus creates a security air gap between NGINX Plus and the OpenVINO model servers. This extra layer of security protects against potential shared vulnerabilities to help safeguard sensitive data in the AI model.

Power AI at the edge

The combined solution from F5, Intel, and Dell makes it easier to support AI inference at the edge. With NGINX Plus on the Intel IPU, responses are faster and more reliable in supporting edge applications such as video analytics and IoT.

The solution also works for content delivery networks with optimized caching and content delivery, as well as providing support for distributed microservices deployments that need reliability across environments.

Accelerate AI Security and Performance with F5, Intel, and Dell

Power high-performance AI inference anywhere securely and consistently with a combined hardware and software solution. Easily deploy AI inference to data centers, clouds, or edge sites while maintaining availability and performance to support users and AI-powered apps.

Learn more about the F5 and Intel partnership at f5.com/intel.