We released version 3.0 of NGINX Ingress Controller in January 2023 with a host of significant new features and enhanced functionality. One new feature we believe you’ll find particularly valuable is Deep Service Insight, available with the NGINX Plus edition of NGINX Ingress Controller.
Deep Service Insight addresses a limitation that hinders optimal functioning when a routing decision system such as a load balancer sits in front of one or more Kubernetes clusters – namely, that the system has no access to information about the health of individual services running in the clusters behind the Ingress controller. This prevents it from routing traffic only to clusters with healthy services, which potentially exposes your users to outages and errors like 404
and 500
.
Deep Service Insight eliminates that problem by exposing the health status of backend service pods (as collected by NGINX Ingress Controller) at a dedicated endpoint where your systems can access and use it for better routing decisions.
In this post we take an in‑depth look at the problem solved by Deep Service Insight, explain how it works in some common use cases, and show how to configure it.
The standard Kubernetes liveness, readiness, and startup probes give you some information about the backend services running in your clusters, but not enough for the kind of insight you need to make better routing decisions all the way up your stack. Lacking the right information becomes even more problematic as your Kubernetes deployments grow in complexity and your business requirements for uninterrupted uptime become more pressing.
A common approach to improving uptime as you scale your Kubernetes environment is to deploy load balancers, DNS managers, and other automated decision systems in front of your clusters. However, because of how Ingress controllers work, a load balancer sitting in front of a Kubernetes cluster normally has no access to status information about the services behind the Ingress controller in the cluster – it can verify only that the Ingress controller pods themselves are healthy and accepting traffic.
NGINX Ingress Controller, on the other hand, does have information about service health. It already monitors the health of the upstream pods in a cluster by sending periodic passive health checks for HTTP, TCP, UDP, and gRPC services, monitoring request responsiveness, and tracking successful response codes and other metrics. It uses this information to decide how to distribute traffic across your services’ pods to provide a consistent and predictable user experience. Normally, NGINX Ingress Controller is performing all this magic silently in the background, and you might never think twice about what’s happening under the hood. Deep Service Insight “surfaces” this valuable information so you can use it more effectively at other layers of your stack.
Deep Service Insight is available for services you deploy using the NGINX VirtualServer and TransportServer custom resources (for HTTP and TCP/UDP respectively). Deep Service Insight uses the NGINX Plus API to share NGINX Ingress Controller’s view of the individual pods in a backend service at a dedicated endpoint unique to Deep Service Insight:
where
spec.host
field of the VirtualServer resourcespec.upstreams.service
field in the TransportServer resourceThe output includes two types of information:
An HTTP status code for the hostname or service name:
200
OK
– At least one pod is healthy418
I’m
a
teapot
– No pods are healthy404
Not
Found
– There are no pods matching the specified hostname or service nameThree counters for the specified hostname or service name:
Total
number of service instances (pods)Up
(healthy) stateUnhealthy
stateHere’s an example where all three pods for a service are healthy:
HTTP/1.1 200 OKContent-Type: application/json; charset=utf-8
Date: Day, DD Mon YYYY hh:mm:ss TZ
Content-Length: 32
{"Total":3,"Up":3,"Unhealthy":0}
For more details, see the NGINX Ingress Controller documentation.
You can further customize the criteria that NGINX Ingress Controller uses to decide a pod is healthy by configuring active health checks. You can configure the path and port to which the health check is sent, the number of failed checks that must occur within a specified time period for a pod to be considered unhealthy, the expected status code, timeouts for connecting or receiving a response, and more. Include the Upstream.Healthcheck
field in the VirtualServer or TransportServer resource.
One use case where Deep Service Insight is particularly valuable is when a load balancer is routing traffic to a service that’s running in two clusters, say for high availability. Within each cluster, NGINX Ingress Controller tracks the health of upstream pods as described above. When you enable Deep Service Insight, information about the number of healthy and unhealthy upstream pods is also exposed on a dedicated endpoint. Your routing decision system can access the endpoint and use the information to divert application traffic away from unhealthy pods in favor of healthy ones.
The diagram illustrates how Deep Service Insight works in this scenario.
You can also take advantage of Deep Service Insight when performing maintenance on a cluster in a high‑availability scenario. Simply scale the number of pods for a service down to zero in the cluster where you’re doing maintenance. The lack of healthy pods shows up automatically at the Deep Service Insight endpoint and your routing decision system uses that information to send traffic to the healthy pods in the other cluster. You effectively get automatic failover without having to change configuration on either NGINX Ingress Controller or the system, and your customers never experience a service interruption.
To enable Deep Service Insight, include the -enable-service-insight
command‑line argument in the Kubernetes manifest, or set the serviceInsight.create
parameter to true
if using Helm.
There are two optional arguments which you can include to tune the endpoint for your environment:
-service-insight-listen-port
<port>
– Change the Deep Service Insight port number from the default, 9114 (<port>
is an integer in the range 1024–65535). The Helm equivalent is the serviceInsight.port
parameter.-service-insight-tls-string
<secret>
– A Kubernetes secret (TLS certificate and key) for TLS termination of the Deep Service Insight endpoint (<secret>
is a character string with format <namespace>/<secret_name>
). The Helm equivalent is the serviceInsight.secret
parameter.To see Deep Service Insight in action, you can enable it for the Cafe application often used as an example in the NGINX Ingress Controller documentation.
Install the NGINX Plus edition of NGINX Ingress Controller with support for NGINX custom resources and enabling Deep Service Insight:
serviceInsight.create
parameter to true
.-enable-service-insight
argument in the manifest file.Verify that NGINX Ingress Controller is running:
$ kubectl get pods -n nginx-ingressNAME READY ...
ingress-plus-nginx-ingress-6db8dc5c6d-cb5hp 1/1 ...
... STATUS RESTARTS AGE
... Running 0 9d
Verify that the NGINX VirtualServer custom resource is deployed for the Cafe application (the IP address is omitted for legibility):
$ kubectl get vs NAME STATE HOST IP PORTS AGE
cafe Valid cafe.example.com ... [80,443] 7h1m
Verify that there are three upstream pods for the Cafe service running at cafe.example.com:
$ kubectl get pods NAME READY STATUS RESTARTS AGE
coffee-87cf76b96-5b85h 1/1 Running 0 7h39m
coffee-87cf76b96-lqjrp 1/1 Running 0 7h39m
tea-55bc9d5586-9z26v 1/1 Running 0 111m
Access the Deep Service Insight endpoint:
$ curl -i <NIC_IP_address>:9114/probe/cafe.example.com
The 200
OK
response code indicates that the service is ready to accept traffic (at least one pod is healthy). In this case all three pods are in the Up state.
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
Date: Day, DD Mon YYYY hh:mm:ss TZ
Content-Length: 32
{"Total":3,"Up":3,"Unhealthy":0}
The 418
I’m
a
teapot
status code indicates that the service is unavailable (all pods are unhealthy).
HTTP/1.1 418 I'm a teapotContent-Type: application/json; charset=utf-8
Date: Day, DD Mon YYYY hh:mm:ss TZ
Content-Length: 32
{"Total":3,"Up":0,"Unhealthy":3}
The 404
Not
Found
status code indicates that there is no service running at the specified hostname.
HTTP/1.1 404 Not FoundDate: Day, DD Mon YYYY hh:mm:ss TZ
Content-Length: 0
For the complete changelog for NGINX Ingress Controller release 3.0.0, see the Release Notes.
To try NGINX Ingress Controller with NGINX Plus and NGINX App Protect, start your 30-day free trial today or contact us to discuss your use cases.
"This blog post may reference products that are no longer available and/or no longer supported. For the most current information about available F5 NGINX products and solutions, explore our NGINX product family. NGINX is now part of F5. All previous NGINX.com links will redirect to similar NGINX content on F5.com."