Microservices are great for development velocity, but the complexity in these architectures lies in the service-to-service communication that microservices depend on.
There are, right now, at least three different architectural options for scaling containerized microservices. Each is based - as is all scale - on a proxy-based load balancer. Each has its own set of challenges. Several of them stem from the simple fact that scale inside container environments often relies on IP tables and its limited fluency with anything above the traditional network layers (that’s IP and TCP).
All of these proxies provide the same core functionality: scaling the services that are distributed throughout the container environment. The crazy thing is that services are ephemeral constructs. They don’t actually exist – except in the resource (configuration) files that define them. Problematic for the IP table-based scaling solutions is that these services are layer 7 (HTTP) constructs, often serving as the “backend” for a single API call instead of an entire application.
Applications, as we know it, may appear from the client-side to be a single, holistic construct. In reality, they are comprised of many different (and distributed) microservices. Some of those services are purely internal, designed to be used by other services. Which means a lot of service-to-service communication inside the containerized environment.
You need L7 (HTTP) routing in these environments because everything is APIs over HTTP/HTTP2. You also need a consistent security stance, authentication, and policy enforcement. None of that is going to happen with an IP tables-based approach.
As is usually the case, Open Source comes to the rescue. Several open source service meshes have risen to address those challenges. As is the case for many Open Source projects, these service meshes (like Istio) are being expanded by projects like Aspen Mesh with capabilities (and support) that provide enterprise-grade solutions.
These expanded efforts are focused on solving the eight challenges organizations encounter when they deploy microservices in containers.
These are the eight challenges, and how a service mesh can overcome them:
- Build – This is one of the challenges a service mesh has little to offer other than integrating policy with CI/CD toolchains and ensuring a declarative model of configuration so the service mesh can be treated as infrastructure as code.
- Test & Integration – A service mesh can help here by ensuring consistent policy between dev, test, prod etc. Some organizations are looking to eliminate staged deployments entirely. This approach worked well in the past, but it is one of the obstacles that inserts latency into the deployment process. These folks are looking for a way to deploy services directly to production and employ traffic steering and roll back mechanisms to deal with failure.
- Versioning – Service mesh can act as a basic API gateway to route traffic based on variables like API version and even translate versions to help during API version transition periods. Client upgrades – especially for apps in the consumer space – can’t always be forced, which means requests coming in for multiple versions. A service mesh can translate requests for older API versions to the latest to help reduce the costs and burden of maintaining multiple versions of the same API.
- Deploy – With its ability to fluently speak HTTP, a service mesh is a great place to enable Blue/Green deployments, canary testing and traffic steering.
- Logging – Distributed logging is always an issue, and it’s even more troubling in environments where instances live for highly variable periods of time. A service mesh offers a common, centralized location to implement logging as well as the ability to perform functions like request tracing.
- Monitoring – At the heart of scale lies monitoring. While applications can implement certain functions (retries, circuit breaking, etc..) to deal with the inevitable failure of a service, this puts a burden on the application is shouldn’t need to shoulder. A service mesh takes on the burden of service-to-service communication and provides a place for monitoring. The goal is to focus on MTTD and MTTR in production, because running in production is hard and failure is inevitable.
- Debugging – The more complex a system is, the harder it is to debug. A service mesh can aid in root cause analysis, provide statistics and pre-fault notifications using analytics and telemetry, and quarantine containers instead of killing them so they can be examined thoroughly. This is particularly helpful in cases where failure is due to slow memory leaks.
- Networking – Networking remains critical to containers, perhaps more so than in less complex environments. The desire to abstract services from that networking means there are many moving parts you do not want to implement in every service: Service Discovery, SSL & cert management, circuit breakers, retries, health monitoring, etc. The goal of microservices was to “code local, code small.” Introducing the need to include networking-related functions bloats microservices and introduces additional architectural and technical debt. A service mesh takes on those functions and delivers the scale and security desired without bogging down development.
Service mesh is an exciting evolution that combines modern principles of cloud and containers with the solid foundations of scale. Expect to see service mesh gain traction as 2018 continues to see increasing adoption of containers and demand for enterprise-grade scale and support.