The Ops Community ⚙️

Cover image for Clusters Are Cattle Until You Deploy Ingress
Gulcan Topcu
Gulcan Topcu

Posted on

Clusters Are Cattle Until You Deploy Ingress

Managing repeatable infrastructure is the bedrock of efficient Kubernetes operations. While the ideal is to have easily replaceable clusters, reality often dictates a more nuanced approach. Dan Garfield, Co-founder of Codefresh, briefly captures this with the analogy: "A Kubernetes cluster is treated as disposable until you deploy ingress, and then it becomes a pet."

Dan Garfield joined Bart Farrell to understand how he managed Kubernetes clusters, transforming them from "cattle" to "pets" weaving in fascinating anecdotes about fairy tales, crypto, and snowboarding.

You can watch (or listen) to this interview here.

Bart: What are your top three must-have tools starting with a fresh Kubernetes cluster?
Dan: Argo CD is the first tool I install. For AWS, I will add Karpenter to manage costs. I will also use Longhorn for on-prem storage solutions, though I'd need ingress. Depending on the situation, I will install Argo CD first and then one of those other two.

Bart: Many of our recent podcast guests have highlighted Argo or Flux, emphasizing their significance in the GitOps domain. Why do you think these tools are considered indispensable?

Dan: The entire deployment workflow for Kubernetes revolves around Argo CD. When I set up a cluster, some might default to using kubectl apply, or if they're using Terraform, they might opt for the Helm provider to install various Helm charts. However, with Argo CD, I have precise control over deployment processes.

Typically, the bootstrap pattern involves using Terraform to set up the cluster and Helm provider to install Argo CD and predefined repositories. From there, Argo CD takes care of the rest.

I have my Kubernetes cluster displayed on the screen behind me, running Argo CD for those who can't see. I utilize Argo CD autopilot, which streamlines repository setup. Last year, when my system was compromised, Argo CD autopilot swiftly restored everything. It's incredibly convenient. Moreover, when debugging, the ability to quickly toggle sync, reset applications, and access logs through the UI is invaluable. Argo CD is, without a doubt, my go-to tool for Kubernetes. Admittedly, I'm biased as an Argo maintainer, but it's hard to argue with its effectiveness.

Bart: Our numerous podcast discussions with seasoned professionals show that GitOps has been a recurring theme in about 90% of our conversations. Almost every guest we've interviewed has emphasized its importance, often mentioning it as their primary tool alongside other essentials like cert manager, Kyverno, or OPA, depending on their preferences.

Could you introduce yourself to those unfamiliar with you? Tell us your background, work, and where you're currently employed.

Dan: I'm Dan Garfield, the co-founder and chief open-source officer at CodeFresh. As Argo maintainers, we're deeply involved in shaping the GitOps landscape. I've played a key role in creating the GitOps standard, establishing the GitOps working group, and spearheading the OpenGitOps project.

Our journey began seven years ago when we launched CodeFresh to enhance software delivery in the cloud-native ecosystem, primarily focusing on Kubernetes. Alongside my responsibilities at CodeFresh, I actively contribute to SIG security within the Kubernetes community and oversee community-driven events like ArgoCon. Outside of work, I reside in Salt Lake City, where I indulge in my passion for snowboarding. Oh, and I'm a proud father of four, eagerly awaiting the arrival of our fifth child.

Bart: It’s a fantastic journey. We'll have to catch up during KubeCon in Salt Lake City later this year. Before delving into your entrepreneurial venture, could you share how you entered Cloud Native?

Dan: My journey into the tech world began early on as a programmer. However, I found myself gravitating more towards the business side, where I discovered my knack for marketing. My pivotal experience was leading enterprise marketing at Atlassian during the release of Data Center, Atlassian's clustered tool version. Initially, it didn't garner much attention internally, but it soon became a game-changer, driving significant revenue for the company. Witnessing this transformation, including Atlassian's public offering, was exhilarating, although my direct contribution was modest as I spent less than two years there.

I noticed a significant change in containerization, which sparked my interest in taking on a new challenge. Conversations with friends starting container-focused experiences captivated me. Then, Raziel, the founder of Codefresh, reached out, sharing his vision for container-driven software development. His perspective resonated deeply, prompting me to join the venture.

Codefresh initially prioritized building robust CI tools, recognizing that effective CD hinges on solid CI practices and needed to be improved in many organizations at the time (and possibly still is). As we expanded, we delved into CD and explored ways to leverage Kubernetes insights.

Kubernetes had yet to emerge as the dominant force when we launched this journey. We evaluated competitors like Rancher, OpenShift, Mesosphere, and Docker Swarm. However, after thorough analysis, Kubernetes emerged as the frontrunner, boldly cueing us to bet on its potential.

Our decision proved visionary as other platforms gradually transitioned towards Kubernetes. Amazon's launch of EKS validated our foresight. This strategic alignment with Kubernetes paved the way for our deep dive into GitOps and Argo CD, driving the project's growth within the CNCF and its eventual graduation.

Bart: It's impressive how much you've accomplished in such a short timeframe, especially while balancing family life. With the industry evolving rapidly, How do you keep up with the cloud-native scene as a maintainer and a co-founder?

Dan: Indeed, staying updated involves reading blogs, scrolling through Twitter, and tuning into podcasts. However, I've found that my most insightful learnings come from direct conversations with individuals. For instance, I've assisted the community with Argo implementations, not as a sales pitch but to help gather insights genuinely. Interacting with Codefresh users and engaging with the broader community provides invaluable perspectives on adoption challenges and user needs.

Oddly enough, sometimes, the best way to learn is by putting forth incorrect opinions or questions. Recently, while wrestling with AI project complexities, I pondered aloud whether all Docker images with AI models would inevitably be bulky due to PyTorch dependencies. To my surprise, this sparked many helpful responses, offering insights into optimizing image sizes. Being willing to be wrong opens up avenues for rapid learning.

Bart: That vulnerability can indeed produce rich learning experiences. It's a valuable practice. Shifting gears slightly, if you could offer one piece of career advice to your younger self, what would it be?

Dan: Firstly, embrace a mindset of rapid learning and humility. Be more open to being wrong and detach ego from ideas. While standing firm on important matters is essential, recognize that failure and adaptation are part of the journey. Like a stone rolling down a mountain, each collision smooths out the sharp edges, leading to growth.

Secondly, prioritize hiring decisions. The people you bring into your business shape its trajectory more than any other factor. A wrong hire can have far-reaching consequences beyond their salary. Despite some missteps, I've been fortunate to work with exceptional individuals who contribute immensely to our success. When considering a job opportunity, I always emphasize the people's quality, the mission's significance, and fair compensation. Prioritizing in this order ensures fulfillment and satisfaction in your career journey.

Bart: That's insightful advice, especially about hiring. Surrounding yourself with talented individuals can make all the difference in navigating business challenges. Now, shifting gears to your recent tweet about Kubernetes and Ingress, who was the intended audience for that tweet?

Dan: Honestly, it was more of a reflection for myself, perhaps shouted into the void. I was weighing the significance of deploying Ingress within Kubernetes. In engineering, a saying that "the problem is always DNS" suggests that your cluster becomes more tangible once you configure DNS settings. Similarly, setting up Ingress signifies a shift in how you perceive and manage your cluster. Without Ingress, it might be considered disposable, like a development environment. However, once Ingress is in place, your cluster hosts services that require more attention and care.

Bart: For those unfamiliar with the "cattle versus pets" analogy in Kubernetes, could you elaborate on its relevance, particularly in the context of Ingress?

Dan: While potentially controversial, the "cattle versus pets" analogy illustrates a fundamental concept in managing infrastructure. In this analogy, cattle represent interchangeable and disposable resources, much like livestock in a ranching operation. Conversely, pets are unique, loved entities requiring personalized care.

In Kubernetes, deploying resources as "cattle" means treating them as replaceable, identical units. However, Ingress introduces a shift towards a "pet" model, where individual services become distinct and valuable entities. Just as you wouldn't name every cow on a farm, you typically wouldn't concern yourself with the specific details of each interchangeable resource. But once you start deploying services accessible via Ingress, each service becomes unique and worthy of individual attention, akin to caring for a pet.

Bart: It seems the "cattle versus pets" analogy is stirring some controversy among vegans, which is understandable given its context. How does this analogy relate to Kubernetes and Ingress?

Dan: In software, the analogy helps distinguish between disposable, interchangeable components (cattle) and unique, loved entities (pets). For instance, in my Kubernetes cluster, the individual nodes are like cattle—replaceable and without specific significance. If one node malfunctions, I can easily swap it out without concern.

However, once I deploy Ingress and start hosting services, the cluster takes on a different role. While the individual nodes remain disposable, the cluster becomes more akin to a pet. I care about its state, its configuration, and its uptime. Suddenly, I'm monitoring metrics and ensuring its well-being, similar to caring for a pet's health.

So, the analogy underscores the shift in perception and care that occurs when transitioning from managing generic infrastructure to hosting meaningful services accessible via Ingress.

Bart: That's a fascinating perspective. How do Kubernetes and Ingress relate to all of this?

Dan: The ingress in Kubernetes is a central resource for managing incoming traffic to the cluster and routing it to different services. However, unlike other resources in Kubernetes, such as those managed by Argo CD, the ingress is often shared among multiple applications. Each application may have its own deployment rules, allowing for granular control over updates and configurations. For example, one application might only update when manually triggered, while another automatically updates when changes are detected.

The challenge arises because updating Ingress impacts multiple applications simultaneously. Through this centralized routing mechanism, you're essentially juggling the needs of various applications. This complexity underscores the importance of managing the cluster effectively, as each change to Ingress affects the entire ecosystem of applications.

The Argo CD community is discussing introducing delegated server-side field permissions. This feature would allow one application to modify components of another, easing the burden of managing shared resources like Ingress. However, it's still under debate, and alternative solutions may emerge. Other tools, like Contour, offer a different approach by treating each route as a separate custom resource, allowing applications to manage their routing independently.

Ultimately, deploying the ingress marks a shift in the cluster's dynamics, requiring considerations such as DNS settings and centralized routing configurations. As a result, the cluster becomes more specialized and less disposable as its configuration becomes bespoke to accommodate the routing needs of various applications.

Bart: Any recommendations for those who aim to keep their infrastructure reproducible while needing Ingress?

Dan: One approach is abstraction and leveraging wildcards. While technically, you can deploy an Ingress without external pointing; I prefer the concept of self-updating components. Tools like Crossplane or Google Cloud's Config Connector allow you to represent non-Kubernetes resources as Kubernetes objects. Incorporating such tools into your cluster bootstrap process ensures the dynamic creation of necessary components.

However, there's a caveat. Despite reproducible clusters, external components like DNS settings may not be. Updating name servers remains a manual task. It's a tricky aspect of operations that needs a perfect solution.

Bart: How do GitOps and Argo CD fit into solving this challenge?

Dan: GitOps and Argo CD play a crucial role in managing complex infrastructure, especially with sensitive data. The key lies in representing all infrastructure resources, including secrets and certificates, as Kubernetes objects. This approach enables Argo CD to track and reconcile them, ensuring that the desired state defined in Git reflects accurately in your cluster.

Tools like Crossplane, vCluster (for managing multiple clusters), or Cluster API (for provisioning additional clusters) can extend this approach to handle various infrastructure resources beyond Kubernetes. Essentially, Git serves as the single source of truth for your entire infrastructure, with Argo CD functioning as the engine to enforce that truth.

A common issue with Terraform is that its state can get corrupted easily because it must constantly monitor changes. Crossplane often uses Terraform under the hood. The problem is not with Terraform's primitives but with the data store and its maintenance. Crossplane ensures the data store remains uncorrupted, accurately reflecting the current state. If changes occur, they appear as out of sync in Argo CD.

You can then define policies for reconciliation and updates, guiding the controller on the next steps. This approach is crucial for managing infrastructure effectively. Using etcd as your data store is an excellent pattern and likely the future of infrastructure management.

Bart: What would happen if the challenges of managing Kubernetes infrastructure extend beyond handling ingress traffic to managing sensitive information like state secrets and certificates? This added complexity could lead to a "pet" cluster scenario. Would you think backup and recovery tools like Velero would be easier to use without these additional challenges?

Dan: I need to familiarize myself with Velero. Can you tell me about it?

Bart: Velero is a tool focused on backing up and restoring Kubernetes resources. Since you mentioned Argo CD and custom resources earlier, I'm curious about your approach to backing up persistent volumes. How did you manage disaster recovery in your home lab when everything went haywire?

Dan: I've used Longhorn for volume restoration, and clear protocols were in place. I'm currently exploring Velero, which looks like a promising tool for data migration.

Managing data involves complexities like caring for a pet, requiring careful handling and migration. Many people need help managing stateful workloads in Kubernetes. Fortunately, most of my stateful workloads in Kubernetes can rebuild their databases if data is lost. Therefore, data loss is manageable for me. Most of the elements I work with are replicable. Any items needing persistence between sessions are stored in Git or a versioned, immutable secret repository.

Bart: It's worth noting, especially considering what happened with your home lab. Should small startups prioritize treating their clusters like cattle, or is ClickOps sufficient?

Dan: It depends on the use cases. vCluster, a project I'm fond of, is particularly well-suited for creating disposable development clusters, providing developers with isolated sandboxes for testing and experimentation. It allows deploying a virtualized cluster on an existing Kubernetes setup, which saves significantly on ingress costs, especially on platforms like AWS, where you can consolidate ingress into one.

Another example is using Argo CD's application sets to create full-stack environments for each pull request in a Git repository. These environments, which include a virtual cluster, are unique to each pull request but remain completely disposable and easily recreated, much like cattle.

However, managing ingress for disposable clusters can be challenging. When deployed and applied to vClusters, ingress needs custom configurations, requiring separate tracking and maintenance. Despite this, it's still beneficial to prioritize treating infrastructure as disposable. For example, while my on-site Kubernetes cluster is a "pet" that requires careful maintenance, its nodes are considered "cattle" that can be replaced or reconfigured without disrupting overall operations. This abstraction is a core principle of Kubernetes and allows for greater flexibility and resilience.

By abstracting clusters away from custom configurations and focusing on reproducibility, you can treat them more like cattle, even if they have some pet-like qualities due to ingress deployment and DNS configurations. This commoditization of clusters simplifies management and enables greater scalability. The more you abstract and standardize your infrastructure, the smoother your operations will become. And to be clear, this analogy has nothing to do with dietary choices.

Bart: If you could rewind time and change anything, what scenario would you create to avoid writing that tweet?

Dan: We've been discussing a feature in Argo CD that allows for delegated field permissions to happen server-side. It addresses a problem inherent in Kubernetes architecture, particularly regarding ingress. The current setup doesn't allow for external delegation of its components, even though many users operate it that way. If I could make changes, I might have split ingress into an additional resource, including routes as a separate definition that users could manage independently.

Exploring other scenarios where delegated field permissions would be helpful is crucial. Ingress is the most obvious example, highlighting an area for potential improvement. Creating separate routes and resources could solve this issue without altering Argo CD. This approach, similar to Contour's, could be a promising solution. Contour's separate resource strategy demonstrates learning from Ingress and making improvements. We should consider adopting tools like Contour or other service mesh ingress providers, as several compelling options are available.

Bart: If you had to build a cluster from scratch today, how would you address these issues whenever possible?

Dan: Sometimes you just have to accept the challenge and not try to work around it. Setting up ingress and configuring DNS for a single cluster might not be a big deal, but it's worth considering a re-architecture if you're doing it on a large scale, like 250,000 times. For instance, with Codefresh, many users opt for our hybrid setup. They deploy our GitOps agent, based on Argo CD, on their cluster, which then connects to our control plane.

One of the perks we offer is a hosted ingress. Instead of setting up ingresses for each of their 5000 Argo CD instances, users can leverage our hosted ingress, saving money and configuration headaches. Consider alternatives like a tunneling system instead of custom ingress setups, depending on your use case. A hosted ingress can be a game-changer for large-scale distributed setups like multiple Argo CD instances, saving costs and simplifying configurations. Ultimately, re-architecting is always an option tailored to what works best for you.

Bart: We're nearing the end of the podcast and want to touch on a closing question, which we are looking at from a few different angles. How do you deal with the anxiety of adopting a new tool or practice, only to find out later that it might be wrong?

Dan: I've seen this dynamic play out. Sometimes, organizations invest heavily in a tool, only to realize a few years later that there are better fits. Take the example of a company transitioning to Argo workflows for CICD and deployment, only to discover that Argo CD would have been a better fit for most of their use cases. However, these transitions are well-spent efforts. In their case, the journey through Argo workflows paved the way for a smoother transition to Argo CD. Sometimes, detaching the wrong direction is necessary to reach the correct destination faster.

You can only sometimes foresee the ideal solution from where you are, and experimenting with different tools is part of the learning process. It's essential not to dwell on mistakes but to learn from them and move forward. After all, even if a tool ultimately proves to be the wrong choice, it often still brings value. The key is recognizing when a change is needed and adapting accordingly. Mistakes only become fatal if we fail to acknowledge and learn from them.

Bart: We stumbled upon your blog, Today Was Awesome, which hasn't seen an update in a while. You wrote a post about Bitcoin, priced at around $450 in 2015. Are you a crypto millionaire now?

Dan: Not quite! Crypto is a fascinating topic, often sparking wild debates. While there's no shortage of scams in the crypto world, there's also genuine innovation happening. I dabbled in Bitcoin early on and even mined a bit to understand its potential use cases better. One notable experience was mentoring at Hack the North, a massive hackathon where numerous projects leveraged Ethereum. I strategically sold my Bitcoin for Ethereum, which turned out well. However, I'm still waiting on those Lambos—I'm not quite at millionaire status yet!

Bart: Your blog covers many topics, including one post titled "What are we really supposed to learn from fairy tales.” How did you decide on such diverse content?

Dan: I can't recall the exact inspiration, but my wife and I often joke about how outdated the moral lessons in fairy tales feel. Exploring their relevance in today's world is an interesting angle to explore.

Bart: What's next for you? More fairy tales, moon-bound Lamborghinis, or snowboarding adventures? Also, let's discuss your recent tweet about making your bacon. How did that start?

Dan: Ah, yes, making bacon! It's surprisingly simple. First, you get pork belly and cure it in the fridge for seven to ten days. Then, you smoke it for a couple of hours.

My primary motivation was to avoid the nitrates found in store-bought bacon linked to health issues. Homemade bacon tastes better, is of higher quality, and is cheaper. My freezer now overflows with homemade bacon, which makes for a unique and well-received gift. People love the taste; overall, it's been a rewarding and delicious effort!

Bart: Regardless of dietary choices, considering where your food comes from and being involved in the process—whether by growing your food or making it yourself and turning it into a gift for others—creates a different, enriching experience. What's next for you?

Dan: This year, my focus is on environment management and promotion. In the Kubernetes world, we often think about applications, clusters, and instances of Argo CD to manage everything. We're working on a paradigm shift: we think about products instead of applications. In our context, a product is an application in every environment in which it exists. Hence, if you deploy a development application, move it to stage, and finally to production, you're deploying the same application with variations three times. That's what we call a product. We’re shifting from thinking about where an application lives to considering its entire life cycle. Instead of focusing on clusters, we think about environments because an environment might have many clusters.

For instance, retail companies like Starbucks, Chick-fil-A, and Pizza Hut often have Kubernetes clusters on-site. Deploying to US West might mean deploying to 1,300 different clusters and 1,300 different Argo CD instances. We abstract all that complexity by grouping them into the environments bucket. We focus on helping people scale up and build their workflow using environments and establishing these relationships. The feedback has been incredible; people are amazed by what we’re demonstrating.

We're showcasing this at ArgoCon next month in Paris. After that, I plan to do some snowboarding and then make it back in time for the birth of my fifth child.

Bart: That's a big plan. 2024 is packed for you. If people want to contact you, what's the best way to do it?

Dan: Twitter is probably the best. You can find me at @todaywasawesome. If you visit my blog and leave comments, I won't see them, as it's more of an archive now. I keep it around because I worked on it ten years ago and occasionally reference something I wrote.

You can also reach out on LinkedIn, GitHub, or Slack. I respond slower on Slack, but I do get to it eventually.

Wrap up

  • If you enjoyed this interview and want to hear more Kubernetes stories and opinions, visit KubeFM and subscribe to the podcast.
  • If you want to keep up-to-date with Kubernetes, subscribe to Learn Kubernetes Weekly.
  • If you want to become an expert in Kubernetes, look at courses on Learnk8s.
  • Finally, if you want to keep in touch, follow me on Linkedin.

Top comments (0)