Gulcan Topcu

Posted on Jun 7, 2024

eBPF, sidecars, and the future of the service mesh

#kubernetes #servicemesh #ebpf

Kubernetes and service meshes may seem complex, but not for William Morgan, an engineer-turned-CEO who excels at simplifying the intricacies. In this enlightening podcast, he shares his journey from AI to the cloud-native world with Bart Farrell.

Discover William's cost-saving strategies for service meshes, gain insights into the ongoing debate between sidecars, Ambient Mesh, and Cilium Cluster Mesh, his surprising connection to Twitter's early days and unique perspective on balancing tech expertise with the humility of being a piano student.

You can watch (or listen) to this interview here.

Bart: Imagine you've just set up a fresh Kubernetes cluster. What's your go-to trio for the first tools to install?

William: My first pick would be Linkerd. It's a must-have for any Kubernetes cluster. I then lean towards tools that complement Linkerd, like Argo and cert-manager. You're off to a solid start with these three.

Bart: Cert Manager and Argo are popular choices, especially in the GitOps domain. What about Flux?

William: Flux would work just fine. I don't have a strong preference between the two. Flux and Argo are great options, especially for tasks like progressive delivery. When paired with Linkerd, they provide a robust safety net for rolling out new code.

Bart: As the CEO, who are you accountable to? Could you elaborate on your role and responsibilities?

William: Being a CEO is an exciting shift from my previous role as an engineer. I work for myself, and I must say, I’m a demanding boss. As a CEO, I focus on the big picture and align everyone toward a common goal. These are the two skills I’ve had to develop rapidly since transitioning from an engineer, where my primary concern was writing and maintaining code.

Bart: From a technical perspective, how did you transition into the cloud-native space? What were you doing before it became mainstream?

William: My early career was primarily focused on AI, NLP, and machine learning long before they became trendy. I thought I’d enter academia but realized I enjoyed coding more than research.

I worked at several Bay Area startups, mainly in NLP and machine learning roles. I was part of a company called PowerSet, which was building a natural language processing engine and was acquired by Microsoft. I then joined Twitter in its early days, around 2010, when it had about 200 employees. I started on the AI side but transitioned to infrastructure because I found it more satisfying and challenging. We were doing what I now describe at Twitter as cloud-native, even though the terminology differed. We didn’t have Kubernetes or Docker, but we had Mesos, the JVM for isolation, and cgroups for a basic form of containerization. We transitioned from a monolithic Ruby on Rails service to a massive microservices deployment. When I left Twitter, we tried to apply those same ideas to the emerging world of Kubernetes and Docker.

Bart: How do you keep up with the rapid changes in the Kubernetes and cloud-native ecosystems, especially transitioning from infrastructure and AI/NLP?

William: My current role primarily shapes my strategy. I learn a lot from the engineers and users of Linkerd, who are at the forefront of these technologies. I also keep myself updated by reading discussions on Reddit platforms like r/kubernetes and r/Linkerd. Occasionally, I contribute to or follow discussions on Hacker News. Overall, my primary source of knowledge comes from the experts I work with daily, giving me valuable insights into the latest developments.

Bart: If you could return to your time at Twitter or even before that, what one tip would you give yourself?

William: I'd tell myself to prioritize impact. As an engineer, I was obsessed with building and exploring new technologies, which was rewarding. However, I later understood the value of stepping back to see where I could make a real difference in the company. Transitioning my focus to high-impact areas, such as infrastructure at Twitter, was a turning point. Despite my passion for NLP, I realized that infrastructure was where I could truly shine. Always look for opportunities where you can make the most significant impact.

Bart: Let’s focus on "Sidecarless eBPF Service Mesh Sparks Debate," which follows up on your previous article “eBPF, sidecars, and the future of the service mesh.” You're one of the creators of Linkerd. For those unfamiliar, what exactly is a service mesh? Why would someone need it, and what value does it add?

William: There are two ways to describe service mesh: what it does and how it works. Service mesh is an additional layer for Kubernetes that enhances key areas Kubernetes doesn't fully address.

The first area is security. It ensures all connections in your cluster are encrypted, authorized, and authenticated. You can set policies based on services, gRPC methods, or HTTP routes, like allowing Service A to talk to /foo but not /bar.

The second area is reliability. It enables graceful failovers, transparent traffic shifting between clusters, and progressive delivery. For example, deploying new code and gradually increasing traffic to it to avoid immediate production traffic. It also includes mechanisms like load balancing, circuit breaking, retries, and timeouts.

The last area is observability. It provides uniform metrics for all workloads across all services, such as success rates, latency distribution, and traffic volume. Importantly, it does this without requiring changes to your application code.

The most prevalent method today involves using many proxies. This approach has become feasible thanks to technological advancements like Kubernetes and containers, which simplify the deployment and management of many proxies as a unified fleet. A decade ago, deploying 10,000 proxies would have been absurd, but it is feasible and practical today. The specifics of deploying these proxies, their locations, programming languages, and practices are subject to debate. However, at a high level, service meshes work by running these layer seven proxies that understand HTTP, HTTP2, and gRPC traffic and enable various functionalities without requiring changes to your application code.

Bart: Can you briefly explain how the data and control planes work in service meshes, especially compared to the older sidecar model with an extra container?

William: A service mesh architecture consists of two main components: a control plane and a data plane. The control plane allows you to manage and configure the data plane, which directs network traffic within the service mesh. In Kubernetes, the control plane operates as a collection of standard Kubernetes services, typically running within a dedicated namespace or across the entire cluster.

The data plane is the operational core of a service mesh, where proxies manage network traffic. The sidecar model, employed by service meshes like Linkerd, deploys a dedicated proxy alongside each application pod. Therefore, a service mesh with 20 pods would have 20 corresponding proxies. The overall efficiency and scalability of the service mesh rely heavily on the size and performance of these individual proxies.

In the sidecar model, service A and service B communication flows through service A's and service B's proxy. Service A sends its message to its sidecar proxy, and then the service A proxy forwards it to service B's sidecar proxy. Finally, service B's proxy delivers the message to service B itself. This indirect communication path adds extra hops, leading to a slight increase in latency. You must carefully consider the potential performance impacts to ensure that service mesh benefits outweigh the trade-offs.

Bart: We've been discussing the benefits of service meshes, but running an extra container for each pod sounds expensive. Does cost become a significant issue?

William: Service meshes have a compute cost, just like adding any component to a system. You pay for CPU and memory, but memory tends to be the more significant concern, as it can force you to scale up instances or nodes.

However, Linkerd has minimized this issue with a "micro proxy" written in Rust. Rust's strict memory management allows fast, lightweight proxies and avoids memory vulnerabilities like buffer overflows, which are common in C and C++. Studies from both Google and Microsoft have shown that roughly 70% of security bugs in C and C++ code are due to memory management errors.

Our choice of Rust as the programming language in 2018 was a calculated risk. Rust offers the best of both worlds: the speed and control of languages like C/C++ and the safety and ease of use of languages with runtime environments like Go. Rust and its network library ecosystem were still relatively young at that time. We invested significantly in underlying libraries like Tokio, Tower, and H2 to build the necessary infrastructure.

The critical role of the data plane in handling sensitive application data drove this decision. We ensured its reliability and security. Rust enables us to build small, fast, and secure proxies that scale with traffic, typically using minimal memory, directly translating to the user experience. Instead of facing long response times (like 5-second tail latencies), users experience faster interactions (closer to 30 milliseconds). A service mesh can optimize these tail latencies, improving user experience and customer retention. Choosing Rust has proven to be instrumental in achieving these goals.

While cost is a factor, the actual cost often stems from operational complexity. Do you need dedicated engineers to maintain complex proxies, or does the system primarily work independently? That human cost usually dwarfs the computational one.

Our design choices have made managing Linkerd’s costs relatively straightforward. However, for other service meshes, costs can escalate if the proxies are large and resource-intensive. Even so, the more significant cost is often not the resources but the operational overhead and complexity. This complexity can demand considerable time and expertise, increasing the overall cost.

Bart: You raise a crucial point about the human aspect. While we address technical challenges, the time spent resolving errors detracts from other tasks. The community has developed products and projects to tackle these concerns and costs. One such example is Istio with Ambient Mesh. Another approach is sidecarless service meshes like Cilium Cluster Mesh. Can you explain what Ambient Mesh is and how it enhances the classic sidecar model of service meshes?

William: We've delved deep into both of these options in Linkerd. While there might come a time when adopting these projects makes sense for us, we're not there yet.

Every decision involves trade-offs regarding distributed systems, especially in production environments within companies where the platform is a tool to support applications. At Linkerd, our priority is constantly reducing the operational workload.

Ambient Mesh and eBPF aren't primarily reactions to complexity but responses to the practical annoyances of sidecars. Their key selling point is eliminating the need for sidecars. However, the real question is: What's the cost of this shift? That's where the analysis becomes crucial.

In Ambient Mesh, rather than having sidecar containers, you utilize connective components, such as tunnels, within the namespace. These tunnels communicate with proxies located elsewhere in the cluster. So essentially, you have multiple proxies running outside of the pod, and the pods use these tunnels to communicate with the proxies, which then handle specific tasks.

This setup is indeed intriguing. As mentioned earlier, running sidecars can be challenging due to specific implications. One such implication is the cost factor, which we discussed earlier. In Linkerd’s case, this is a minor concern. However, a more significant implication is the need to restart the pod to upgrade the proxy to the latest version, given the immutability of pods in Kubernetes.

This situation necessitates managing two separate updates: one to keep the applications up-to-date and another to upgrade the service mesh. Therefore, while the setup has advantages, it also requires careful management to ensure smooth operation and optimal performance.

We operate the proxy as the first container for various reasons, which can lead to friction points, such as when using kubectl logs. Typically, when you request logs, you're interested in your application's logs, not the proxy's. This friction, combined with a desire for networking to operate seamlessly in the background, drives the development of solutions like Ambient and eBPF, which aim to eliminate the need for explicit sidecars.

Both Ambient and eBPF solutions, which are closely related, are reactions to this sentiment of not wanting to deal with sidecars directly. The aim is to make sidecars disappear. Take Istio and most service meshes built on Envoy, for instance. Envoy is complex and memory-intensive and requires constant attention and tuning based on traffic specifics.

Challenges with sidecars are more of a cloud-native trend to market solutions, like writing a blog post proclaiming the death of sidecars rather than being specific to Linkerd. They can sometimes be an inaccurate reflection of the reality of engineering.

In Ambient, eliminating sidecars by running the proxy elsewhere and using tunnel components allows for separate proxy maintenance without needing to reboot applications for upgrades. However, in a Kubernetes environment, the idea is that pods should be rebootable anytime. Kubernetes can reschedule pods as needed, which aligns with the principles of building applications as distributed systems. Yet, there are legacy applications or specific scenarios where rebooting could be more convenient, making the sidecar approach less appealing.

Historically, running cron jobs with sidecar proxies in Kubernetes posed a significant challenge. Kubernetes lacked a built-in mechanism to signal the sidecar proxy when the main job was complete, necessitating manual intervention to prevent the proxy from running indefinitely. This manual process went against the core principle of service mesh, which aims to decouple services from their proxies for easier management and scalability.

Thankfully, one significant development is the Sidecar Container Kubernetes Enhancement Proposal. With this enhancement, you can designate your proxy as a sidecar container, leading to several benefits, like jobs terminating the proxy once finished and eliminating unnecessary resource consumption.

For Linkerd, adopting Ambient mesh architecture introduces more complexity than benefits. The additional components, like the tunnel and separate proxies, add unnecessary layers to the system. Unlike Istio, which has encountered issues due to its architecture, Linkerd's existing design hasn't faced similar challenges. Therefore, the trade-offs associated with Ambient aren't justified for Linkerd.

In contrast, the sidecar model offers distinct advantages. It creates clear operational and security boundaries at the pod level. Each pod becomes a self-contained unit, making independent decisions regarding security and operations, aligning with Kubernetes principles, and simplifying management in a cloud-native environment.

This sidecar approach is crucial for implementing zero-trust security. The critical principle of zero trust is to enforce security policies at the most granular level possible. Traditional approaches relying on a perimeter firewall and implicitly trusting internal components are no longer sufficient. Instead, each security decision must be made independently at every system layer. This granular enforcement is achieved by deploying a sidecar proxy within each application pod, acting as a security boundary and enabling fine-grained control over network traffic, authentication, and authorization.

In Linkerd, every request undergoes a rigorous security check within the pod. This check includes verifying the validity of the TLS encryption, confirming the client's identity through cryptographic algorithms, and ensuring the request comes from a trusted source. Additionally, Linkerd checks whether the request can access the specific resource or method it's trying to reach. This multi-layered scrutiny happens directly inside the pod, providing the highest possible level of security within the Kubernetes framework. Maintaining this tight security model is crucial, as any deviation, like separating the proxy and TLS certificate, weakens the model and introduces potential vulnerabilities.

Bart: The next point I'd like to discuss has garnered significant attention in recent years through Cilium Service Mesh and various domains. What is eBPF?

William: eBPF is a kernel technology that enables the execution of specific code within the kernel, offering significant advantages. Firstly, operations within the kernel are high-speed, eliminating the overhead of context switching between kernel and user space. Secondly, the kernel has unrestricted access to all system resources, requiring robust security measures to ensure eBPF programs are safe. This powerful technology empowers developers to create highly efficient and secure solutions for various system tasks, particularly networking, security, and observability.

Traditionally, user-space programs lacked direct access to kernel resources, relying on system calls to communicate with the kernel. While providing security, this syscall boundary introduced cost overhead, especially with frequent requests like network packet processing.

eBPF revolutionized this by enabling user-defined code to run within the kernel with stringent safety measures. The number of instructions an eBPF program can execute is limited, and infinite loops are prohibited to prevent resource monopolization. The bytecode verifier meticulously ensures every possible execution path can be explored to avoid unexpected behavior or malicious activity. The bytecode is also verified for GPL compliance by checking for specific strings in its initial bytes.

These security measures make eBPF a powerful but restrictive mechanism, enabling previously unattainable capabilities. Understanding what eBPF can and cannot do is crucial, despite marketing claims that might blur these lines. While many promote eBPF as a groundbreaking solution that could eliminate the need for sidecars, the reality is more nuanced. It's crucial to understand its limitations and not be swayed by marketing hype.

Bart: There appears to be some confusion regarding the extent of limitations associated with eBPF. If eBPF has limitations, does that imply that these limitations constrain all service meshes using eBPF?

William: The idea of an eBPF-based service mesh can sometimes need clarification. In reality, the Envoy proxy still handles the heavy lifting, even in these eBPF-powered meshes. eBPF has limitations, especially in the network space, and can't fully replace the functionality of a traditional proxy.

While eBPF has many applications, including security and performance monitoring, its most interesting potential lies in instrumenting applications. The kernel can directly measure CPU usage, function calls, and other performance metrics by residing in the kernel.

However, when it comes to networking, eBPF faces significant challenges. Maintaining large amounts of state, essential for many network operations, is difficult, bordering on impossible. This challenge highlights the limitations of eBPF in entirely replacing traditional networking components like proxies.

The role of eBPF in networking, particularly within service meshes, is often overstated. While it excels in certain areas, like efficient TCP packet processing and simple metrics collection, other options exist beyond traditional proxies. Complex tasks like HTTP2 parsing, TLS handshakes, or layer seven routings are challenging, if possible, to implement purely with eBPF.

Some projects attempt complex eBPF implementations for these tasks but often involve convoluted workarounds that sacrifice performance and practicality. In practice, eBPF is typically used for layer 4 (transport layer) tasks, while user-space proxies like Envoy handle more complex layer 7 (application layer) operations.

Service meshes like Cilium, despite their claims of being sidecar-less, often rely on daemonset proxies to handle these complex tasks. While eliminating sidecars, this approach introduces its own set of problems. Security is compromised as TLS certificates are mixed in the proxy's memory, and operational challenges arise when the daemonset goes down, affecting seemingly random pods scheduled on that machine.

Linkerd, having experienced similar issues with its first version (Linkerd1.x) running as a daemonset, opted for the sidecar model in subsequent versions. Sidecars provide clear operational and security boundaries, making management and troubleshooting easier.

Looking ahead, eBPF can still be a valuable tool for service meshes. Linkerd, for instance, could significantly speed up raw TCP proxying by offloading tasks to the kernel. However, for complex layer seven operations, a user-space proxy remains essential.

The decision to use eBPF and the choice between sidecars and daemonsets are distinct considerations, each with advantages and drawbacks. While eBPF offers powerful capabilities, it doesn't inherently dictate a specific proxy architecture. Choosing the most suitable approach requires careful evaluation of the system's requirements and trade-offs.

Bart: Can you share your predictions about conflict or uncertainty concerning service meshes and sidecars for the next few years? Is there a possibility of resolving this? Should we anticipate the emergence of new groups? What are your expectations for the near and distant future?

William: While innovation in this field is valuable, relying solely on marketing over technical analysis needs more appeal, especially for those prioritizing tangible customer benefits.

Regarding the future of service meshes, their value proposition is now well-established. The initial hype has given way to a practical understanding of their necessity, with users selecting and implementing solutions without extensive deliberation. This maturity is a positive development, shifting the focus from explaining the need for a service mesh to optimizing its usage.

Functionally, service meshes converge on core features like MTLS, load balancing, and circuit breaking. However, a significant area of development and our primary focus is mesh expansion, which involves integrating non-Kubernetes components into the mesh. We have a big announcement regarding this in mid-February.

Bart: That sounds intriguing. Please give us a sneak peek into what this announcement is about.

William: It is about Linkerd 2.15! The release of Linkerd 2.15 is a significant step forward. It introduces the ability to run the data plane outside Kubernetes, enabling seamless TLS communication for both VM and pod workloads.

The industry mirrors this direction, as evidenced by developments like the Gateway API, which converges to handle both ingress and service mesh configuration within Kubernetes. This unified approach allows consistent configuration primitives for traffic entering, transiting, and exiting the cluster.

The industry will likely focus on refining details like eBPF integration or the advantages of Ambient Mesh while the fundamental value proposition of service meshes remains consistent. I'm particularly excited about how these advancements can be applied across entire organizations, starting with securing and optimizing Kubernetes environments and extending these benefits to the broader infrastructure.

Bart: Shifting away from the professional side, we heard you have an interesting tattoo. Is it of Linkerd, or what is it about?

William: It’s just a temporary one. We handed them out at KubeCon last year as part of our swag. While everyone else gave out stickers, we thought we'd do something more extraordinary. So, we made temporary tattoos of Linky the Lobster, our Linkerd mascot.

When Linkerd graduated within the CNCF, reaching the top tier of project maturity, we needed a mascot. Most mascots are cute and cuddly, like the Go Gopher. We wanted something different, so we chose a blue lobster—an unusual and rare creature reflecting Linkerd's unique position in the CNCF universe.

The tattoo featured Linky the Lobster crushing some sailboats, which is part of our logo. It was a fun little easter egg. If you were at KubeCon, you might have seen them. That event was in Amsterdam.

Bart: What's next for you? Are there any side projects or new ventures you're excited about?

William: I'm devoting all my energy to Linkerd and Buoyant. That takes up most of my focus. Outside of work, I'm a dad. My kids are learning the piano, so I decided to start learning, too. It's humbling to see how fast they pick it up compared to me. As an adult learner, it's a slow process. It's interesting to be in a role where I'm the student, taking lessons from a teacher who's probably a third my age and incredibly talented. It’s an excellent reminder to stay humble, especially since much of my day involves being the authority on something. It’s a nice change of pace and a bit of a reality check.

Bart: That's a good balance. It's important to remind people that doing something you could be better at is okay. As a kid, you're used to it—no expectations, no judgment.

William: Exactly. However, it can be a struggle as an adult, especially as a CEO. I've taught Linkerd to hundreds of people without any panic, but playing a piano recital in front of 20 people is terrifying. It's the complete opposite.

Bart: If people want to contact you, what's the best way?

William: You can email me at william@buoyant.io, find me on Linkerd Slack at slack.linkerd.io, or DM me at @WM on Twitter. I'd love to hear about your challenges and how I can help.

Wrap up

If you enjoyed this interview and want to hear more Kubernetes stories and opinions, visit KubeFM and subscribe to the podcast.
If you want to keep up-to-date with Kubernetes, subscribe to Learn Kubernetes Weekly.
If you want to become an expert in Kubernetes, look at courses on Learnk8s.
Finally, if you want to keep in touch, follow me on Linkedin.

The Ops Community ⚙️

eBPF, sidecars, and the future of the service mesh

Top comments (0)