When Do You Need A Service Mesh?
In our previous blog, we explored the concept of a service mesh and how it can significantly benefit application developers by simplifying service-to-service communication, providing advanced routing and failover capabilities, and improving observability. But "when" you need one was a topic deserving of its own nuanced post.
This is that post.
The General Guidelines
Realistically, your need for a service mesh is directly proportional to the number of network services your application relies on. If you're only talking to one or two internal services, you might still be able to find/replace network calls or manage things via environment variables.
The math changes with external network services, usually via APIs. When your application depends on an API to deliver product value, the need for monitoring and control escalates quickly. Problems with your provider are suddenly your problem too, and meshes like Envoy and Taskless with outbound call support can help you get a handle on the services outside of your network.
In general though, you can use the following guidelines to figure out if it's time for a service mesh in your stack:
- Microservice Complexity - The more inter-dependent your services are, the more a service mesh can manage that complexity by giving you a single place for managing communication between these services.
- External Service Dependencies - Adding network dependencies outside of your cloud such as a payment, shipping, AI, or business API can greatly increase your risk of customer-facing issues. Monitoring, instrumentation, and remediation for these calls without a service mesh's support would otherwise require a code change at every invocation.
- URL and Request Shenanigans - Failovers, canaries, traffic splitting, and service address rewriting are just some of the L7 features service meshes provide. Trying to replicate even a single feature like traffic splitting via environment variables is certainly doable, but requires a fair amount of code to instrument. Plus, you'd need to maintain all that custom code.
- Consistent Observability - APM (Application Performance Monitoring) offers high-level observation of your network requests, but lacks the nuance per-service to capture the data that answers "why" something is going wrong. Service meshes hook into the full request lifecycle, with observability designed for the problems unique to network connections. This becomes even more important when the network connections are external.
Considerations for Adopting
It isn't all roses though. Just because you've got a half dozen APIs you depend on doesn't mean it's time to break out the service mesh. You are adding complexity and that cost is worth considering.
First, and most importantly, adopting a service mesh is going to add a new control plane to your architecture. Control planes are where the configuration, permissions, and rollouts happen for your mesh. While some meshes can run without control planes, doing so will add significant code overhead. Generally, if the service mesh has a control plane, you'll want to use the control plane.
Then there's the plugin languages themselves. Linkerd uses kubernetes pods, while Taskless, Envoy, and Consul all use Lua. Chances are, these are not technologies your company is currently using, and that means customizing your service mesh will require some amount of language expertise. Some service meshes like Taskless have a rich marketplace of mesh behaviors, meaning you'll only need to learn the mesh's language for truly esoteric problems.
Finally, service meshes mean you've got another layer of indirection between your application code and your service. Even though most meshes only add minimal latency (<30ms), they also add to the call stack, increase exception lengths, and may generate additional OTel spans. None of these are necessarily bad, but your tooling should be capable of handling the additional logging data created by the addition of the mesh.
Minimizing Adoption Risk, Using an Application Service Mesh
Depending on what your goals are, you may not need a low-level service mesh like Envoy, Consul, or Linkerd. Application level concerns are better solved in the application layer itself; that's where Taskless comes in.
Taskless offers a modern, application developer-friendly approach to service mesh technology. With a focus on ease of use and seamless integration, Taskless empowers developers to take control of service-to-service communication at the L7 layer without the steep learning curve often associated with traditional service meshes.
Key features of Taskless include:
- Automatic instrumentation: Taskless automatically instruments your application, providing out-of-the-box observability and control over service-to-service communication.
- Customizable routing rules: With Taskless, developers can easily define and manage routing rules, enabling advanced traffic management scenarios such as canary deployments and traffic splitting.
- Zero-deploy design: Taskless eliminates the need for complex deployment processes. It seamlessly integrates with your existing application code and infrastructure, allowing you to manage service mesh functionality without modifying your deployment pipelines.
- Separation of concerns: Taskless decouples application service mesh logic from the reliability service mesh concerns, empowering SRE teams to manage their service mesh functionality independently of the application. This separation of concerns means both teams can focus on their respective areas of expertise.
By leveraging Taskless, developers can quickly and easily implement the application-level service mesh functionality, such as retries, timeouts, and observability, without the need for extensive configuration or infrastructure changes.
So Do You Need a Service Mesh?
Deciding whether to use a service mesh ultimately depends on your application architecture, control and flexibility requirements, and the level of complexity you are willing to introduce. If you have a complex microservices architecture with advanced routing and observability needs, or a high reliance on third party network services, a service mesh like Taskless can provide significant benefits.
There's a range of options if you decide to use a service mesh too, from L7 optimized meshes like Taskless, to the lower level container meshes like Envoy and Linkerd.
Ultimately, the decision to adopt a service mesh should be based on your application's needs, performance requirements, and development team's capabilities. Whatever your level of comfort is, there's a service mesh waiting when you're ready.