Application delivery services are critical for successful applications. Whether such services add scalability, reliability, or security, most applications rely on one or more. Application Delivery Controllers (ADCs), therefore, occupy a critical place in most application, cloud, and data center designs.
For many environments, including private cloud installations, dedicated ADC hardware is still the preferred platform for delivering application delivery services. This is because a dedicated platform incorporates controlled, stable, and consistent resources. A dedicated, purpose-built appliance can consistently deliver the performance and reliability that even the most demanding application workload requires, because it has no variation in hypervisor, software, or underlying compute platform—plus it has the advantages of specialized hardware components to offload tasks from the CPU.
But what does "performance" actually mean? In general, ADC vendors publish four main types of metrics to demonstrate performance:
Manufacturers provide comprehensive data sheets listing these and other platform characteristics, including tables of throughput, SSL transaction numbers, or concurrent connections. Interpreting these numbers and their relevance to an application workload, and understanding the likely limits and bottlenecks of a system, are essential to selecting the correct platform. Learning to interpret vendor data sheets and understand the metrics relevant to your applications can help you be more successful in selecting the right platforms for your business.
An Application Delivery Controller is an infrastructure component that acts as an application proxy, providing application delivery services such as traffic management, load balancing, SSL decryption, application layer security, and access control for applications. Client devices and services connect to the ADC, and the ADC creates a separate (or reuses an existing) connection to the application. In this logical gap, the ADC inserts application delivery services.
To understand the workload of an ADC, it is helpful to look at a TCP connection and application-layer request. The ADC must perform tasks at multiple layers of the TCP stack and accomplish a number of activities to deliver application services to the application traffic. (See Figure 1.) This can make interpreting performance metrics from ADC vendors difficult. A key prerequisite for understanding which metrics are relevant is to identify the workload type.
There are many types of applications and therefore many different ADC workloads. While most production deployments contain a mix of workloads, the impact and needs of each workload influence which components of an ADC will be most utilized. Even within components, workload needs vary. Some workloads may be more sensitive to latency, others more sensitive to jitter; some workloads are sensitive to throughput limits, while some workloads are more dependent upon availability.
Here are the most common workloads, along with the key metrics that support them:
Workload mixes continue to evolve and new workloads are constantly being introduced, each with its own critical ADC metrics based on the workload's primary operations.
Workload Type | Examples | Important Key Metrics |
---|---|---|
Transactional HTTP web applications |
Websites, many mobile applications |
SSL RPS, TPS, throughput, layer 7 RPS, CPS |
DNS | Any web application |
Layer 3 throughput, layer 4 throughput |
REST API | Applications based on force.com |
SSL RPS, TPS, throughput, layer 7 RPS, CPS |
MQTT | IoT, Facebook Messenger | Layer 4 CPS, throughput |
Diameter | Mobile phone networks |
Layer 4 CPS, connections |
Financial trading |
FIX / SAIL / OUCH | Layer 4 CPS, layer 7 RPS, CPS |
WebSockets | MQTT over HTTP(s) | Layer 4 CPS, layer 7 RPS, CPS, connections |
Logging and alerting |
Syslog or SNMP traffic |
Layer 4 CPS, layer 7 RPS, CPS (REST) |
The throughput of any network device is a function of the latency—the delay introduced by the processing of the network traffic. At the very least, the speed of light imposes a minimum latency for electrical signals traveling across copper wires or optical signals traversing optical fiber. Besides just moving the data across wire or fiber, however, an ADC performs several operations on the network traffic. The maximum capabilities of an ADC are the sum of serial operation latencies, that is, operations not performed in parallel. Fortunately, many operations are performed in parallel, but the time necessary to perform those operations remains a constraint to total throughput. One purpose of ADC designers is to minimize those latencies to maximize throughput.
Evaluating the performance of a particular ADC and attempting to match it to a particular deployment can be daunting. Networking vendors publish metrics based on tests intended to maximize a particular metric at the expense of others. The reason for this approach is to publish a usable number that can guide architecture and decisions, but the vendors understand that not all published numbers will be true simultaneously. Using a car analogy: Toyota advertises its 2017 Camry base model as producing 178 horsepower and achieving 33 miles per gallon on the highway, yet it would be unreasonable to expect the car to produce the full 178 horsepower—pushing the accelerator to the floor at 6,000 RPM—while simultaneously providing 33 miles per gallon. In a similar way, networking vendors show individual performance metrics using a best-case scenario for each. As a rule, many of the published performance metrics cannot be reproduced simultaneously.
Some of the published metrics can be applied to different levels of processing. For example, requests per second could represent values for OSI layer 2 (L2) or OSI layer 7 (L7) processing. On the other hand, TPS often refers only to SSL key negotiation, while layer 7 requests per second refers to subsequent cryptographic requests using an established SSL session. A particular workload will exercise the various ADC processing components differently than another workload. Another key takeaway regarding ADC published metrics is that different workloads will encounter different performance limits.
Layer | Common Metrics |
---|---|
2 | Packets / Throughput |
3 | Packets / Throughput |
4 | Connections / Throughput |
5/6(SSL/Compression) |
Transactions / Requests / Throughput |
7 | Connections / Requests |
Each OSI network layer has several metrics most commonly published for those layers. For example, it is common to see layer 4 metrics quoted in connections per second, and other metrics for layer 4 throughput. OSI layers 5 and 6 do not readily map to the IP stack, but services such as SSL/TLS and compression can be thought of as layers 5 and 6. As noted above, tests for each of these metrics are often performed with traffic designed to stress that particular metric and do not represent real-world traffic loads. For example, maximum SSL TPS will result from a test with a zero-length SSL payload, so that no ADC processing time is expended decrypting SSL data. While reasonable for determining only the performance of SSL hardware, no production application will be sending zero-length payloads. Similarly, layer 2 throughput will be tested without SSL enabled and with no layer 7 processing, since these will slow the ADC, although many production deployments will use those features. Most ADC metrics are tested in isolation, but most production environments use a combination of ADC features, each metric stressing different components or combinations of components. The components involved may include the CPU, network interface card (NIC), application-specific integrated circuits (ASICs), and field-programmable gate arrays (FPGAs), among others. Where an ADC lacks a particular component, the CPU is used instead.
Layer | Metric | Component Stressed |
---|---|---|
2 | Packets |
NIC |
Throughput |
NIC / FPGA | |
3 | Packets |
FPGA |
Throughput |
FPGA | |
4 | Connections | FPGA |
Throughput |
FPGA | |
5/6 | Transactions (SSL) |
SSL ASIC / CPU / Memory |
Requests (SSL) |
CPU | |
Throughput (SSL) |
Crypto ASIC | |
Throughput (Compression) |
Compression ASIC |
|
7 | Requests |
CPU / Memory |
Throughput |
CPU / Memory |
The packet metric states how many packets per second the ADC can process, and the components most stressed by this processing vary by layer. For example, processing layer 2 packets primarily stresses the Network Interface Card (NIC), while processing layer 4 packets primarily stresses the FPGA. The throughput metric on all layers refers to total available throughput in gigabits per second, while the connections metric measures how many connections per second can be made at that particular layer. For example, the layer 4 connection metric measures how many TCP connections can be established per second.
SSL processing is unique in that sessions are established and managed across connections. One connection can establish an SSL session (which is called a transaction), while subsequent connections can reuse an SSL session (called a request). Therefore, transaction and request metrics are listed separately. By far, SSL transactions lake longer than subsequent SSL requests, so the limiting number in throughput performance is often the transactions metric. Once an SSL session is established and a subsequent request is made, the data payload must be encrypted or decrypted. The crypto ASIC handles encryption and decryption; performance is measured in the throughput metric.
Often it is useful to compress the data payload. Compression is performed by the compression ASIC and its metric is also throughput.
Finally, layer 7 is unique in that, given the complex and varied traffic management options available, all of the layer 7 processing is performed via CPU. Cookie persistence is a common layer 7 capability that ties each user session to a particular server in the pool and is performed by the CPU. The layer 7 requests metric refers to the number of layer 7 requests per second that the ADC can perform. Similarly, the layer 7 throughput metric refers to the total throughput possible at layer 7.
Finally, any layer that needs to preserve the connection state will require the ADC to maintain a connection table. Connection tables are common for layer 4 TCP connections, SSL sessions, and layer 7 HTTP sessions. Protocols with long-lived connections can exhaust or stress an ADC connection table.
Each metric can help determine a specific aspect of performance. Understanding the different operations performed at each layer, along with which components are affected by those operations, assists when assessing ADC performance metrics for a particular deployment.
Every aspect of ADC functionality can be provided by a CPU. In fact, many ADC hardware vendors offer a software-only version. This is possible because a CPU is the most flexible type of hardware available, capable of performing almost any data-centric task.
Entrusting every aspect of ADC functionality to a CPU has its limits, however. Of the three primary types of hardware for processing network traffic—CPU, ASIC, and FPGA—the CPU is the slowest and arguably the most expensive, needing the support of memory and memory controllers. The other two types of hardware for processing network traffic are the ASIC and the FPGA. An ASIC is designed to perform a specific task, hence the name. No other hardware type is faster than an ASIC for performing a task, but its capabilities are limited to what is designed into the chip. If an application needs capabilities not available from the ASIC, the application must instead use a CPU and perform the tasks in software.
If a CPU is flexible and slow, and an ASIC is inflexible and fast, a third technology operates in the middle: the FPGA. An FPGA is slower than an ASIC but much faster than a CPU, and it can be programmed to perform tasks not envisioned by the FPGA designers.
A well-designed ADC will use the capabilities of each hardware type to its fullest extent: relegating the common and simple tasks to ASIC components, performing more complex tasks in FPGA components, and handling the most complex and least common tasks in the CPU. Much of the engineering magic in an ADC is the result of coordinating the different component types to most efficiently handle the various workloads.
The most frequent task, by far, performed in an ADC is packet processing at the network interfaces. A standard, off-the-shelf NIC is tuned either for high-volume inbound traffic (for example, in a desktop or other user device), or tuned for high-volume outbound traffic (for example, in a server). An ADC is unique in that it requires a NIC tuned for maximum throughput in both directions. No off-the-shelf NIC has been designed for maximum throughput in both directions. Since ADC packet processing is frequent and is a relatively simple task, it is well-suited for an ASIC. In clustered ADC environments, a disaggregator (DAG) ASIC acts as a front-end load balancer, ensuring that the same client and server sessions always flow through the same cluster node. Use of a DAG in a clustered environment facilitates horizontal scaling of ADC appliances to meet traffic demands. In an ADC that’s properly designed, all layer 2 packet processing and switching can be performed in specialized ASIC hardware.
Layer 3 and 4 processing tasks are more complex, making them well suited for an FPGA. An FPGA can provide routing capabilities as well as firewall and distributed denial-of-service (DDoS) protections, including TCP SYN cookies. Use of an FPGA at this level allows for layer 4 processing and protections, ensuring that traffic considered for further processing has been properly assembled and filtered.
Encryption and compression processing, such as SSL or TLS, and the new capabilities introduced by HTTP/2, are another use for dedicated hardware. Often a specialized ASIC will be used for cryptographic processing, including the computationally intensive SSL key negotiation of modern ciphers such as elliptic curve Diffie-Hellman encryption (ECDHE). Once SSL key negotiation has occurred, bulk encryption and decryption of subsequent requests can also be handled by ASIC hardware. Similarly, compression and decompression using common algorithms can be performed by ASIC hardware. Using dedicated ASIC components enables fast processing of encryption and compression.
Any remaining processing of network traffic not handled by ASICs and FPGAs must be processed by the CPU. While slow relative to ASICs and FPGAs, the CPU is the most flexible component in the ADC. The CPU is also tasked with tasks unrelated to network processing, such as the GUI and other configuration duties, handling I/O interrupts, or even processing disk requests. Because demands on the CPU can vary with latency and the CPU is the last hardware type to process network traffic, ADCs are intentionally designed to direct as little traffic processing as possible to the CPU, processing as much as possible on faster ASICs and FPGAs instead.
It can be difficult to translate vendors' published metrics into real world performance. Understanding the interplay between the different types of workloads, the resources they consume, and the capabilities of a hardware platform, though complex, can help you make the best purchase decision for your organization.
Published data sheets and other resources help make specifying the right platform easier, but it is also important to rely directly on the experience and expertise of the vendor whenever possible. Leading vendors will have deep expertise in matching workloads to platforms—an expertise that should be readily available to customers.
Combining a good understanding of your application traffic characteristics and platform capabilities with the expertise of your vendor will reduce the risk, and potential expense, of under- or over-provisioning your platform.