<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>The Ops Community ⚙️: Hrittik Roy</title>
    <description>The latest articles on The Ops Community ⚙️ by Hrittik Roy (@hrittikhere).</description>
    <link>https://community.ops.io/hrittikhere</link>
    <image>
      <url>https://community.ops.io/images/bOUa7N5BvjPI5-KxPhHkzkCQwMIlGbQh9wavIEvvuzI/rs:fill:90:90/g:sm/mb:500000/ar:1/aHR0cHM6Ly9jb21t/dW5pdHkub3BzLmlv/L3JlbW90ZWltYWdl/cy91cGxvYWRzL3Vz/ZXIvcHJvZmlsZV9p/bWFnZS82OC9iZmFi/MjJiNi03MWUyLTRj/YWYtODM1NC1lOTE3/MjdmMGVkMWYuanBl/Zw</url>
      <title>The Ops Community ⚙️: Hrittik Roy</title>
      <link>https://community.ops.io/hrittikhere</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://community.ops.io/feed/hrittikhere"/>
    <language>en</language>
    <item>
      <title>Knative: Serverless on Kubernetes</title>
      <dc:creator>Hrittik Roy</dc:creator>
      <pubDate>Wed, 25 May 2022 20:26:47 +0000</pubDate>
      <link>https://community.ops.io/hrittikhere/knative-serverless-on-kubernetes-1an</link>
      <guid>https://community.ops.io/hrittikhere/knative-serverless-on-kubernetes-1an</guid>
      <description>&lt;p&gt;When we hear about cloud and cloud native, we often come across many technical terms, but sometimes we don’t pay any attention to them. But, keeping these terms in the back of our mind can come in handy.&lt;/p&gt;

&lt;p&gt;So, right now, we will discuss one such term, which you have probably heard during reading about Kubernetes, called Knative.&lt;/p&gt;

&lt;p&gt;Let’s start!&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Knative?
&lt;/h2&gt;

&lt;p&gt;We all know that Kubernetes is an open-source solution platform that helps manage workloads and services in containers present in the cloud that helps in automation and configuration. Well, Knative can be simplified as Kubernetes on steroids.&lt;/p&gt;

&lt;p&gt;The demand around scalable and reliable services is increasing every day exponentially. The market is driven by customers demanding their favorite services to have zero downtime and companies that lose millions of dollars for every minute they’re down. If you have come across the space that is responsible for keeping the systems up, you would […]&lt;br&gt;
Knative is a platform installed on top of Kubernetes that provides you with serverless capabilities. The capabilities help you to deploy, run and manage serverless and cloud-native application to Kubernetes.&lt;/p&gt;

&lt;p&gt;Cloud-native applications are scalable applications and run in all types of cloud environments. Now, the question remains what is serverless? And what is its relationship with Knative? We will have a look at that now.&lt;/p&gt;

&lt;h2&gt;
  
  
  Knative and Serverless
&lt;/h2&gt;

&lt;p&gt;The cloud computing execution model where the machine resources allocated by the cloud providers taking care of the infrastructure is known as serverless cloud computing. Simply, you need to worry about your code, and everything else is managed.&lt;/p&gt;

&lt;p&gt;In recent years, serverless adoption has started, with more and more individuals depending on serverless technology to meet organizations’ specific needs. A survey conducted by Serverless Inc showed in 2018 that half of the respondents used serverless in their job, and the numbers are projected to rise further. In this post, we would get around […]&lt;br&gt;
The relation between Knative and serverless cloud computing is very simple. Knative has a serverless environment. So, in Knative applications, we can be assured that machine resources of cloud vendors are managing the application servers, and that helps in faster deployment of the application.&lt;/p&gt;

&lt;p&gt;As Knative is a serverless solution and helps in developing modern, faster applications, it saves time for a developer to build more in cloud computing. Now, let’s watch the components which hold Knative firmly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Components of Knative
&lt;/h2&gt;

&lt;p&gt;All this serverless Framework of Knative stands on the three major components, and they are the following.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building Framework
&lt;/h2&gt;

&lt;p&gt;Building Framework helps in extending Kubernetes abilities. It also helps in utilizing the Kubernetes primitives (building blocks of Kubernetes) to enable the running of the on-cluster container builds from source code, meaning helping the container containing clusters to run directly from the written code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Eventing Framework
&lt;/h2&gt;

&lt;p&gt;Eventing Framework is mainly responsible for the creation of communication between the event producers and event consumers who have zero knowledge about each other’s components to achieve the architecture which will help in running, deploying and others based on the event.&lt;/p&gt;

&lt;h2&gt;
  
  
  Serving Framework
&lt;/h2&gt;

&lt;p&gt;Serving Framework is highly responsible for supporting the deployment of serverless applications and functions on Kubernetes and Istio. It helps in the rapid deployment of serverless containers, automatic scaling of up and down to zero, routing, networking and other things for Istio and Kubernetes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Advantages of Knative
&lt;/h2&gt;

&lt;p&gt;Knative frame and applications have several benefits and advantages that help many individuals mitigate some of the challenges. Let’s have a brief look below.&lt;/p&gt;

&lt;p&gt;Fast Deployment&lt;br&gt;
Fast iterative development is one of the big advantages that Knative offers because it helps in the rapid deployment of applications and cuts down a lot of time during container building, and, as a result, faster rollouts of container versions are possible&lt;/p&gt;

&lt;p&gt;Code Focused&lt;br&gt;
Who doesn’t likes just to focus on code? Knative applications provide an event-driven architecture simply meaning the architecture will automatically enable the application to run deploy and other things automatically, which helps developer focus on writing code and not on the infrastructure.&lt;/p&gt;

&lt;p&gt;Serverless&lt;br&gt;
Faster entry into serverless computing is possible with Knative as it is a serverless framework and helps in the quicker establishment of serverless workflows. In addition, manual configurations are not required as all the works of the servers are done behind the scene.&lt;/p&gt;

&lt;p&gt;Knative&lt;br&gt;
Usage of Kubernetes ecosystem by groups (Source: CNCF)&lt;br&gt;
Disadvantages of Knative&lt;br&gt;
Managing container infrastructure is the biggest and the only drawback of Knative. Knative is not aimed at the end-users, and because of which we have to manually manage the infrastructure of the containers.&lt;/p&gt;

&lt;p&gt;To simply put, the customers have to manually manage the container infrastructure because Knative mainly facilitates developers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Value Proposition of Knative
&lt;/h2&gt;

&lt;p&gt;From all the above discussions that we have seen, we can safely say that Knative is highly helpful when it comes to application deployment and creating serverless functionalities.&lt;/p&gt;

&lt;p&gt;Knative is fully open-source, which is a big advantage to all the companies and businesses which want to migrate to serverless cloud computing; as we all know, open-source frameworks or platforms are free of cost. This is really big when we compare such businesses with mid-market values and their aim to be serverless in cloud computing.&lt;/p&gt;

&lt;p&gt;Companies or businesses with high market caps can hugely contribute to the Knative Framework as well as get the advantages of using it as big companies will probably prefer their developers to focus more on coding, which will help in their product development and also can get the advantage of serverless computing that automatically helps developers to focus more on code building.&lt;/p&gt;

&lt;p&gt;Small market caps businesses get a free pass here to get the advantages of using the Knative Platform and Framework as it is completely free of cost, and you can actually get the source codes from Github, which will help a lot of businesses to grow more with the help of serverless computing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;So, we have seen all the things Knative is modernizing the cloud and serverless ecosystem and empowering Istio and Kubernetes. &lt;/p&gt;

</description>
      <category>devops</category>
      <category>cloudops</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>Kaniko: How Users Can Make The Best Use of Docker</title>
      <dc:creator>Hrittik Roy</dc:creator>
      <pubDate>Wed, 25 May 2022 20:21:51 +0000</pubDate>
      <link>https://community.ops.io/hrittikhere/kaniko-how-users-can-make-the-best-use-of-docker-35fl</link>
      <guid>https://community.ops.io/hrittikhere/kaniko-how-users-can-make-the-best-use-of-docker-35fl</guid>
      <description>&lt;p&gt;Whether you love or hate containers, there are only a handful of ways to work with them properly that ensures proper application use with Docker. While there do exist a handful of solutions on the web and on the cloud to deal with all the needs that come with running Docker, Kaniko has something amazing to offer.&lt;/p&gt;

&lt;p&gt;Kaniko was released as a standalone addition through the Google cloud interface and has been inducted under the Cloud Native Computing Foundation(CNCF). Kaniko as a tool helps users to build containerized images from their Docker applications through the Dockerfiles. It doesn’t require any standalone Docker run daemon’s to process it through.&lt;/p&gt;

&lt;p&gt;Let’s take a more defined look at some of it’s more undiscussed elements, advantages and demerits.&lt;/p&gt;

&lt;p&gt;The Problems People Run Into With Docker&lt;br&gt;
Docker has become an industry standard with a multitude of uses and benefits but still has cross dependency issues due to the need for a DockerFile to rely upon interactive access to a Docker daemon. This essentially requires root access on your machine to run.&lt;/p&gt;

&lt;p&gt;Users can thus run into problems for making containerized images in environments that don’t provide any support or can’t run the Docker daemons. Kubernetes clusters are a good example of this.&lt;/p&gt;

&lt;p&gt;Kanniko, services these issues by creating a method to convert the container images from a Dockerfile even in the absence of any privileged root access. Users can use Kaniko to build an image from a Dockerfile and push it to a registry all in one go. Since it doesn’t require any special privileges or permissions, it can be installed and run on a typical Kubernetes cluster, Google Kubernetes Engine, or in any environment that lacks access control.&lt;/p&gt;

&lt;p&gt;Docker vs Kaniko Source: Stackshare&lt;/p&gt;

&lt;p&gt;Kaniko is usually run as a container itself, and needs the following information to build a docker image as per the user requirements:-&lt;/p&gt;

&lt;p&gt;The path to Dockerfile.&lt;br&gt;
The path to build a context (workspace) directory. The build context directory is a repository that contains all the necessary resources that are required while building an image.&lt;br&gt;
Destination/URL of the repository where the image will be pushed after the execution completes.&lt;br&gt;
How Does Kaniko Work Exactly?&lt;br&gt;
Kaniko can be run as a container image that takes in the three arguments as previously seen. Once this information has been included to the main registry, the final push will contain only a static Go binary plus the configuration files needed for pushing and pulling images.&lt;/p&gt;

&lt;p&gt;The Kaniko executor then finds and extracts the base image file system to the root, which is the image in the FROM line of the Dockerfile. It executes each command in order, and takes a snapshot of the file system after each command.&lt;/p&gt;

&lt;p&gt;The snapshot created is then fed into a user-space by running the filesystem and comparing it to the prior state that was stored in memory. Kaniko then combines any changes made to the filesystem as a new layer to the original image. These changes are also reflected in the image metadata. After executing every command in the Dockerfile, the user is then free to push the newly built image to the output registry.&lt;/p&gt;

&lt;p&gt;Kaniko unpacks the filesystem, executes commands and creates a copy of the filesystem completely in the user-space within the requirements of the user’s image, which is how it bypasses the needs for any privilege access needs on the machine where it is being deployed. The docker daemon or CLI is not involved in this entire process.&lt;/p&gt;

&lt;p&gt;Image Container Creation Steps Source: Google CloudOps&lt;/p&gt;

&lt;p&gt;The Merits and Pitfalls of Kaniko&lt;br&gt;
Kaniko’s serverless all in one solution for handling container issues seems like a welcoming change from the more convoluted mess that users face when using Kubernetes or Docker. The lack of root access needs in clusters combined with a simpler interface for languages makes Kaniko an amazing tool to use, especially for  software engineers who need fast acting methods to copy and deploy applications through a common Docker setup. Since there’s no dependency on the daemon process, users are free to run Kaniko in any kind of environment.&lt;/p&gt;

&lt;p&gt;Kaniko executes each command within the Dockerfile completely in the userspace using an executor image: gcr.io/kaniko-project/executor which is often made to run inside the container. Kaniko then uses the system interface to run commands inside the Dockerfile and creates a copy of the file system after each command.&lt;/p&gt;

&lt;p&gt;If there are changes to the file system, the executor pipeline takes a snapshot of the filesystem change and reports it as a “diff” layer. These changes are later made permanent in the image metadata. This brings another merit for using Kaniko which is a seamless mechanism to execute files and run applications.&lt;/p&gt;

&lt;p&gt;Some backlash for the system seems to be pitched at the build for Kaniko taking up too many resources and execution time, depending on the applications which can make it unreliable when not working with a serverless platform with much needed computing. Most common build runs for Kaniko have been reported to be much slower compared to it’s cloud build. The general solution may be to simply stick to the cloud version but may not always be the best approach if cloud access isn’t always available.&lt;/p&gt;

&lt;p&gt;Comparison between common tools Source: Slideshare&lt;/p&gt;

&lt;p&gt;Final Thoughts and Review&lt;br&gt;
If you still haven’t used Kaniko for easing the process of container image deployment and copying, it doesn’t hurt to take a small dive at community documentations and even running the application. Kaniko has a fairly simple syntax that can be run with just a few commands through the command line and has no major dependencies for it to be run.&lt;/p&gt;

&lt;p&gt;Users shouldn’t be put off by the extreme dependence towards containers, seeing how resources can be better utilized if proper images can be built on top of them. As an application that is focused on Kubernetes and Docker, Kaniko has much to apply itself onto other common platforms and architectures.&lt;/p&gt;

&lt;p&gt;For the novice or the expert, there are several resources available on the community pages and even on Google’s flagship website for Kaniko to get you started. Tune as we take a much needed look at other applications in the next articles.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>secops</category>
      <category>cicd</category>
    </item>
    <item>
      <title>DVC (Git For Data): A Complete Intro</title>
      <dc:creator>Hrittik Roy</dc:creator>
      <pubDate>Wed, 25 May 2022 20:09:23 +0000</pubDate>
      <link>https://community.ops.io/hrittikhere/dvc-git-for-data-a-complete-intro-2ogl</link>
      <guid>https://community.ops.io/hrittikhere/dvc-git-for-data-a-complete-intro-2ogl</guid>
      <description>&lt;p&gt;As a data scientist or ML engineer, have you ever faced the inconvenience of experimenting with the model? When we train the model, the model file is generated. Now, if you want to experiment with some different parameters or data, generally people rename the existing model file and train the model again and this process goes on. &lt;/p&gt;

&lt;p&gt;The final directory looks something as shown in the below figure 😁. The same thing goes with data as well. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://community.ops.io/images/tE8rBmGhSBxPmMs0wHnmqMkXcfUOg8mzqmXfUoet5N8/w:880/mb:500000/ar:1/aHR0cHM6Ly9kZXYt/dG8tdXBsb2Fkcy5z/My5hbWF6b25hd3Mu/Y29tL3VwbG9hZHMv/YXJ0aWNsZXMvM2pz/NGgyNDd4ZDdubW5t/cm4zeDIucG5n" class="article-body-image-wrapper"&gt;&lt;img src="https://community.ops.io/images/tE8rBmGhSBxPmMs0wHnmqMkXcfUOg8mzqmXfUoet5N8/w:880/mb:500000/ar:1/aHR0cHM6Ly9kZXYt/dG8tdXBsb2Fkcy5z/My5hbWF6b25hd3Mu/Y29tL3VwbG9hZHMv/YXJ0aWNsZXMvM2pz/NGgyNDd4ZDdubW5t/cm4zeDIucG5n" alt="Alt Text" width="400" height="350"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Fig 1: Typical models dir after experimentation&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;But if we talk about code, we don't have this problem there because we have Git to version the code. I mean that I can create a separate branch for changing some code to see its behaviour without altering the previous one. What if we can get a Git kind of versioning control system for data and model? Wouldn't that be amazing? Here comes the DVC (Data Version Control) which exactly does the thing which solves this problem.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://community.ops.io/images/VBCJ-SqD15ekzzirhJyplJ35E-Yz4ZQk-hDZszgsbw4/w:880/mb:500000/ar:1/aHR0cHM6Ly9kZXYt/dG8tdXBsb2Fkcy5z/My5hbWF6b25hd3Mu/Y29tL3VwbG9hZHMv/YXJ0aWNsZXMvMXhy/d2J1NXQ3ejB3YWVr/MHZvemYucG5n" class="article-body-image-wrapper"&gt;&lt;img src="https://community.ops.io/images/VBCJ-SqD15ekzzirhJyplJ35E-Yz4ZQk-hDZszgsbw4/w:880/mb:500000/ar:1/aHR0cHM6Ly9kZXYt/dG8tdXBsb2Fkcy5z/My5hbWF6b25hd3Mu/Y29tL3VwbG9hZHMv/YXJ0aWNsZXMvMXhy/d2J1NXQ3ejB3YWVr/MHZvemYucG5n" alt="Alt Text" width="880" height="1234"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Although, DVC is much more than just data and model versioning. It also helps create end-to-end pipelines, capturing metrics related to pipelines and experimenting with ML models. But in this blog, we will see the data versioning aspect of DVC.&lt;/p&gt;
&lt;h2&gt;
  
  
  Installing DVC
&lt;/h2&gt;

&lt;p&gt;DVC can be used as a typical Python library and can be installed using package managers like &lt;strong&gt;pip&lt;/strong&gt; or &lt;strong&gt;conda&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;To install DVC using pip, you can execute below command in the python supported terminal.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;dvc
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To install DVC using conda, you need to execute below commands in conda terminal.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;conda &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-c&lt;/span&gt; conda-forge mamba &lt;span class="c"&gt;# installs much faster than conda&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;mamba &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-c&lt;/span&gt; conda-forge dvc
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Apart from these conventional installation techniques, DVC can be installed in other different ways too as described &lt;a href="https://dvc.org/doc/install"&gt;here&lt;/a&gt; on the official documentation website.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;Code versioning is not really a new thing for us because we use Git in our daily life but the problem with Git is it becomes extremely slow when the tracking file size is huge as 100 GB. That is where DVC comes in. Git tracks the changes in code whereas DVC tracks the changes in data as well as models. &lt;/p&gt;

&lt;p&gt;The foundation of DVC consists of a few commands you can run along with Git to track large files, directories, or ML model files. In short, you can call DVC &lt;strong&gt;"Git for data"&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Let us understand data versioning by dvc using a simple demo project. Let us assume that we want to create an end-to-end pipeline to prepare, train and evaluate the MNIST dataset and we have code as well as data available and structured as shown in the below figure.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://community.ops.io/images/z4zfKWeRXKKRHigpo8UXdca8LPP7PgjSpNS6lWTC1Hc/w:880/mb:500000/ar:1/aHR0cHM6Ly9kZXYt/dG8tdXBsb2Fkcy5z/My5hbWF6b25hd3Mu/Y29tL3VwbG9hZHMv/YXJ0aWNsZXMvcnpq/dDhnZXg0aGpqNDg3/M3YydWkucG5n" class="article-body-image-wrapper"&gt;&lt;img src="https://community.ops.io/images/z4zfKWeRXKKRHigpo8UXdca8LPP7PgjSpNS6lWTC1Hc/w:880/mb:500000/ar:1/aHR0cHM6Ly9kZXYt/dG8tdXBsb2Fkcy5z/My5hbWF6b25hd3Mu/Y29tL3VwbG9hZHMv/YXJ0aWNsZXMvcnpq/dDhnZXg0aGpqNDg3/M3YydWkucG5n" alt="MNIST pipeline file structure" width="492" height="526"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Fig 2 MNIST pipeline file structure.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;By observing the file structure, we can see the MNIST dataset is stored in the &lt;strong&gt;data&lt;/strong&gt; directory as a &lt;strong&gt;data.xml&lt;/strong&gt; file and the code corresponding to every stage of the pipeline is stored in the &lt;strong&gt;src&lt;/strong&gt; directory. Now as we understand, the code tracking (src) is the responsibility of Git and the data tracking (data) is the responsibility of DVC.&lt;/p&gt;

&lt;p&gt;Before we version the data, we need to initialise the repository as a DVC repository. Just like we run command &lt;code&gt;git init&lt;/code&gt; to initialise the current directory as a Git repository, we need to execute the below command which will initialise the current directory as a DVC repository.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;dvc init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command will create a new directory &lt;strong&gt;.dvc&lt;/strong&gt; just like &lt;code&gt;git init&lt;/code&gt; creates &lt;strong&gt;.git&lt;/strong&gt; directory. Two files &lt;strong&gt;.gitignore&lt;/strong&gt; and &lt;strong&gt;config&lt;/strong&gt; will be generated in &lt;strong&gt;.dvc&lt;/strong&gt; directory. Now, we have successfully initialized this as DVC repository.&lt;/p&gt;

&lt;h2&gt;
  
  
  Start Tracking
&lt;/h2&gt;

&lt;p&gt;In Git, if we want it to track the changes in the code, we run a &lt;code&gt;git add&lt;/code&gt; command which adds the code changes in the local repository. DVC has a similar command as shown below to start tracking the data files.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;dvc add data/data.xml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note here that a &lt;code&gt;dvc add&lt;/code&gt; command accepts arguments which are data or model file names that you want DVC to track. All the filenames must be space-separated.&lt;/p&gt;

&lt;p&gt;By executing this command, DVC will create two files &lt;strong&gt;data/data.xml.dvc&lt;/strong&gt; and &lt;strong&gt;data/.gitignore&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;File &lt;strong&gt;data/.gitignore&lt;/strong&gt; looks something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/data.xml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It means that it forces Git not to track original &lt;strong&gt;data.xml&lt;/strong&gt; file which is expected because the tracking of &lt;strong&gt;data.xml&lt;/strong&gt; is now responsibility of DVC and not Git.&lt;/p&gt;

&lt;p&gt;All files ending with extension .dvc are special files which contains the information about where the actual data files will be stored and cached. If you take a look at data/data.xml.dvc, it just stores an MD5 signature as shown below.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;md5: a7cd139231cc35ed63541ce3829b96db
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now looking at the complete picture, Git will not track &lt;strong&gt;data/data.xml&lt;/strong&gt; because it is added in &lt;strong&gt;data/.gitignore&lt;/strong&gt; but Git will track &lt;strong&gt;data/data.xml.dvc&lt;/strong&gt; which will have information about where the actual &lt;strong&gt;data.xml&lt;/strong&gt; is stored.&lt;/p&gt;

&lt;p&gt;Now the Git tracked 2 files mentioned above will be stored in Git remote repository on Github, BitBucket or Gitlab and DVC tracked data file will be stored in DVC remote repository on any file system that we are going to describe in next section.&lt;/p&gt;

&lt;p&gt;Once we do this, the directory structure for MNIST pipeline project looks something like this.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://community.ops.io/images/Ou-iEaEIR2E6o-y_djo3b1aAYpThVTR_tPpiMaEPyt8/w:880/mb:500000/ar:1/aHR0cHM6Ly9kZXYt/dG8tdXBsb2Fkcy5z/My5hbWF6b25hd3Mu/Y29tL3VwbG9hZHMv/YXJ0aWNsZXMvdHg1/YWFwa3ppdzNnbGlv/dTBkczAucG5n" class="article-body-image-wrapper"&gt;&lt;img src="https://community.ops.io/images/Ou-iEaEIR2E6o-y_djo3b1aAYpThVTR_tPpiMaEPyt8/w:880/mb:500000/ar:1/aHR0cHM6Ly9kZXYt/dG8tdXBsb2Fkcy5z/My5hbWF6b25hd3Mu/Y29tL3VwbG9hZHMv/YXJ0aWNsZXMvdHg1/YWFwa3ppdzNnbGlv/dTBkczAucG5n" alt="File structure after dvc add" width="520" height="796"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Fig 3 File structure after dvc add&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Storing &amp;amp; Sharing
&lt;/h2&gt;

&lt;p&gt;The actual data files (which are not tracked by Git) will be stored in any kind of file system which can also be called DVC repository. File system for DVC repository can be AWS S3 bucket, Google Drive, Google storage bucket, Azure storage, Object Storage Service or any custom created file system. Depending on where you will store the data, you will need to install external dependencies like &lt;code&gt;dvc-s3&lt;/code&gt;, &lt;code&gt;dvc-azure&lt;/code&gt;, &lt;code&gt;dvc-gdrive&lt;/code&gt;, &lt;code&gt;dvc-gs&lt;/code&gt;, &lt;code&gt;dvc-oss&lt;/code&gt;, &lt;code&gt;dvc-ssh&lt;/code&gt;. You can know more about installation &lt;a href="https://dvc.org/doc/install"&gt;here&lt;/a&gt; on the documentation website.&lt;/p&gt;

&lt;p&gt;To push the data files in DVC repository, first we need to configure the remote origin of the repository where the data will be stored. We can do that using below commands.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;dvc remote add &lt;span class="nt"&gt;-d&lt;/span&gt; storage s3://mybucket/dvcstore
&lt;span class="nv"&gt;$ &lt;/span&gt;git add .dvc/config
&lt;span class="nv"&gt;$ &lt;/span&gt;git commit &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="s2"&gt;"Configure remote storage"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;dvc remote add&lt;/code&gt; command adds the remote origin of the DVC repository where the data will be stored. Executing this command will actually add this origin information in &lt;strong&gt;.dvc/config&lt;/strong&gt; file which Git tracks. That is why we also need to commit the changes done in &lt;strong&gt;.dvc/config&lt;/strong&gt; file as well.&lt;/p&gt;

&lt;p&gt;Now, adding the remote origin doesn't automatically push the data in DVC remote repository. We can push the entire data file using below command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;dvc push
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It pushes all the data and model files on which &lt;code&gt;dvc add&lt;/code&gt; command is applied.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://community.ops.io/images/BBKYs_RKXgwag1t6IfInMEhQEIcDVWoTmItGQ9G574M/w:880/mb:500000/ar:1/aHR0cHM6Ly9kZXYt/dG8tdXBsb2Fkcy5z/My5hbWF6b25hd3Mu/Y29tL3VwbG9hZHMv/YXJ0aWNsZXMvOGQ3/cHZld3ByMzBqandr/ZTM1NW0ucG5n" class="article-body-image-wrapper"&gt;&lt;img src="https://community.ops.io/images/BBKYs_RKXgwag1t6IfInMEhQEIcDVWoTmItGQ9G574M/w:880/mb:500000/ar:1/aHR0cHM6Ly9kZXYt/dG8tdXBsb2Fkcy5z/My5hbWF6b25hd3Mu/Y29tL3VwbG9hZHMv/YXJ0aWNsZXMvOGQ3/cHZld3ByMzBqandr/ZTM1NW0ucG5n" alt="DVC" width="875" height="624"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Retrieving
&lt;/h2&gt;

&lt;p&gt;There is a reason behind &lt;code&gt;dvc remote add&lt;/code&gt; stores remote origin information in &lt;strong&gt;.dvc/config&lt;/strong&gt;. It does so because when next time somebody clones this Git repository, It will not have any data. It will only have MD5 signature of the data in corresponding &lt;strong&gt;.dvc&lt;/strong&gt; file and it will also have DVC remote repository location information stored in &lt;strong&gt;.dvc/config&lt;/strong&gt; file. So, the complete data can be pulled using below command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;dvc pull
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It pulls all the data and model files stored in DVC remote repository.&lt;/p&gt;

&lt;h2&gt;
  
  
  ML Pipeline &amp;amp; Versioning
&lt;/h2&gt;

&lt;p&gt;While working with DVC, we have to create two files &lt;code&gt;dvc.yaml&lt;/code&gt; which contains information about the stages the ML pipeline will have and &lt;code&gt;params.yaml&lt;/code&gt; which contains the parameters that different stages of the ML pipeline would use. You can take a look at the structure of both files on the &lt;a href="https://github.com/iterative/get-started-experiments"&gt;example repository&lt;/a&gt;.  &lt;/p&gt;

&lt;p&gt;DVC is so flexible that you can use command &lt;code&gt;dvc exp run&lt;/code&gt; to run the complete pipeline with just single command. The command is not only limited to this.&lt;/p&gt;

&lt;p&gt;If you want to run your experiments with different parameters, you can mention the values of the parameters which are defined in &lt;code&gt;params.yaml&lt;/code&gt; as shown below.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;dvc exp run &lt;span class="nt"&gt;-S&lt;/span&gt; prepare.test_split&lt;span class="o"&gt;=&lt;/span&gt;0.2 &lt;span class="nt"&gt;-S&lt;/span&gt; train.learning_rate&lt;span class="o"&gt;=&lt;/span&gt;0.0001
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note here that whenever you run the experiments using the above command, DVC caches the output so well that it will only run those states which are affected by the parameters. Moreover, if you think one of the stages in the pipeline are too much time consuming and you want to checkpoint it so that in case the program crashes, it can resume from where it crashed, you can define &lt;code&gt;checkpoint: true&lt;/code&gt; in the &lt;strong&gt;output&lt;/strong&gt; of that stage in &lt;code&gt;dvc.yaml&lt;/code&gt; file and it will keep checkpointing your results for you.&lt;/p&gt;

&lt;p&gt;DVC keeps track of all the experiments that you perform and even you can see the history of the previously run experiments using &lt;code&gt;dvc exp show&lt;/code&gt;. It will show you all the previously run experiments. You can filter this history as well. For example, &lt;code&gt;dvc exp show --include-params=train&lt;/code&gt; will only show you the past experiments where any of the parameters of &lt;code&gt;train&lt;/code&gt; stage was changed. Not just that, but you can also see the difference between two experiments' parameters and their resultant metrics as well using &lt;code&gt;dvc exp diff exp-1dad0 exp-1df77&lt;/code&gt;. where &lt;strong&gt;exp-1dad0&lt;/strong&gt; and &lt;strong&gt;exp-1df77&lt;/strong&gt; are tags of two experiments as shown in history given by &lt;code&gt;dvc exp show&lt;/code&gt;. Moreover, You can navigate to any previously run experiment using &lt;code&gt;dvc exp apply&lt;/code&gt;. For example, if you run command &lt;code&gt;dvc exp apply exp-1dad0&lt;/code&gt;, then the files will be changed as per the parameter changes specified in that particular experiment.&lt;/p&gt;

&lt;p&gt;DVC gives us the flexibility to commit the experiment into a separate Git branch. As shown in the below command, using &lt;code&gt;dvc exp branch&lt;/code&gt; command you can commit the experiment &lt;code&gt;exp-1dad0&lt;/code&gt; to separate branch &lt;code&gt;my_branch&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;dvc exp branch exp-1dad0 my_branch
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Alternatively, you can also commit different experiments in different branches just by switching the branches as described below. &lt;/p&gt;

&lt;p&gt;As we already understand switching between different versions of code by changing Git branch. Similarly, we can also change the version of the data as well.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;git checkout experiment_v2
&lt;span class="nv"&gt;$ &lt;/span&gt;dvc checkout data/data.xml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can observe that changing the version of data is now as simple as changing the Git branch. Note here that to switch to another version of data, we first need to switch to that particular Git branch where that version of data resides. Also, note here that you must apply &lt;code&gt;dvc checkout&lt;/code&gt; to those files for which you want to switch the data version since &lt;code&gt;git checkout&lt;/code&gt; only changes the code version.&lt;/p&gt;

&lt;h2&gt;
  
  
  How data versioning simplifies experimentation?
&lt;/h2&gt;

&lt;p&gt;As described in the beginning of this blog, we don't need to keep multiple copies of model files anymore and keep renaming them. Now, we can have single model file which can be versioned using DVC during experimentation. Below image explains the same.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://community.ops.io/images/mc4MVLqgGbklm5i4V21HOSYdXL2BPjUR6uvwtEZYDqo/w:880/mb:500000/ar:1/aHR0cHM6Ly9kZXYt/dG8tdXBsb2Fkcy5z/My5hbWF6b25hd3Mu/Y29tL3VwbG9hZHMv/YXJ0aWNsZXMvc3Zj/czY1M3RwMzg2OXVk/NmF0OTkucG5n" class="article-body-image-wrapper"&gt;&lt;img src="https://community.ops.io/images/mc4MVLqgGbklm5i4V21HOSYdXL2BPjUR6uvwtEZYDqo/w:880/mb:500000/ar:1/aHR0cHM6Ly9kZXYt/dG8tdXBsb2Fkcy5z/My5hbWF6b25hd3Mu/Y29tL3VwbG9hZHMv/YXJ0aWNsZXMvc3Zj/czY1M3RwMzg2OXVk/NmF0OTkucG5n" alt="Model versioning for experimentation" width="428" height="376"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Fig 4 Model versioning for experimentation&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Let us suppose we are working in Git branch &lt;strong&gt;v1&lt;/strong&gt; where the hyper-parameter &lt;strong&gt;n_estimators&lt;/strong&gt; to train model has value &lt;strong&gt;50&lt;/strong&gt; and we have trained a model which is stored as &lt;strong&gt;model.pkl&lt;/strong&gt;. Now, we want to experiment with value &lt;strong&gt;100&lt;/strong&gt; for the same hyper-parameter &lt;strong&gt;n_estimators&lt;/strong&gt;. So, we don't need to rename the old model file anymore. We will switch to another Git branch v2 using &lt;code&gt;git checkout v2&lt;/code&gt; command and we will switch to different model file version using &lt;code&gt;dvc checkout model.pkl&lt;/code&gt;. We will change the value of hyper-parameter &lt;strong&gt;n_estimators&lt;/strong&gt; to &lt;strong&gt;100&lt;/strong&gt; now in code in &lt;strong&gt;v2&lt;/strong&gt; branch and we can train the model again. The new &lt;strong&gt;model.pkl&lt;/strong&gt; file generated in &lt;strong&gt;v2&lt;/strong&gt; branch is corresponding to &lt;code&gt;n_estimators = 100&lt;/code&gt; and &lt;strong&gt;model.pkl&lt;/strong&gt; file in &lt;strong&gt;v1&lt;/strong&gt; branch is corresponding to &lt;code&gt;n_estimators = 50&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Although, this is very small experiment but the same thing can be applied in complex experimentation process. And with data versioning, the experimentation process becomes very flexible. We don't need to worry about tracking of data and model files.&lt;/p&gt;

&lt;h2&gt;
  
  
  Large datasets versioning
&lt;/h2&gt;

&lt;p&gt;When the dataset is too large, we need very efficient mechanism in terms of space and performance to share different versions of data. That is why it is suggested to store the data in shared volume or on an external system. DVC supports both of these mechanisms as described below.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;a href="https://dvc.org/doc/use-cases/fast-data-caching-hub#example-shared-development-server"&gt;shared cache&lt;/a&gt; can be setup to store, version and access lot of data on a large shared volume efficiently.&lt;/li&gt;
&lt;li&gt;As we described earlier, the more advanced approach is to store and version the data directly into remote storage (ex. S3 bucket, Google Drive, GCP storage bucket etc.). You can look &lt;a href="https://dvc.org/doc/user-guide/managing-external-data"&gt;here&lt;/a&gt; to know more about how to configure external data storage for DVC.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;With this, I hope you were able to get a quick overview of how DVC does data versioning. DVC also has features to streamline ML reproducibility and support cloud provisioning through cloud providers. This would come in handy for the entire MLOps community as now they could interact with each other without worrying about monolithic tools.&lt;/p&gt;

&lt;p&gt;If you like to know more about how to do it and get your hands dirty, &lt;a href="https://dvc.org/doc/start"&gt;the documentations&lt;/a&gt; are a great way to start.&lt;/p&gt;

&lt;p&gt;If you prefer videos then they have a &lt;a href="https://www.youtube.com/channel/UC37rp97Go-xIX3aNFVHhXfQ"&gt;youtube channel with detailed tutorials&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Happy Versioning!&lt;/em&gt;&lt;/p&gt;

</description>
      <category>mlops</category>
      <category>dataops</category>
      <category>devops</category>
    </item>
    <item>
      <title>Helm: Package Manager for k8s</title>
      <dc:creator>Hrittik Roy</dc:creator>
      <pubDate>Wed, 25 May 2022 20:04:41 +0000</pubDate>
      <link>https://community.ops.io/hrittikhere/helm-package-manager-for-k8s-md9</link>
      <guid>https://community.ops.io/hrittikhere/helm-package-manager-for-k8s-md9</guid>
      <description>&lt;p&gt;Kubernetes was started inside Google to provide a layer of abstractions with containers for the modern infrastructure. Now, the technology is adopted by the masses and has become a de facto standard for any cloud native application. The open source system provides management, deployment, and scaling of your containers.&lt;/p&gt;

&lt;p&gt;Kubernetes is hard to beat in orchestration, but one of the most significant drawbacks is its lack of reproducibility. Here comes Helm: A package manager for Kubernetes and a CNCF Graduate Project.&lt;/p&gt;

&lt;p&gt;I was thrilled to speak at &lt;a href="https://www.meetup.com/Data-on-Kubernetes-community/events/283335251/"&gt;DoK Talks on the 114 Edition&lt;/a&gt; about Helm and how it tackles the reproducibility problem for Kubernetes. This post will be a summary of the beginner focused event which took place on 27th January 2022. &lt;/p&gt;

&lt;p&gt;It's not required for you to go through the recordings as this is a very extensive summary and going through this will give you a basic understanding of Helm!&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What’s the reproducibility problem?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;After you have deployed your application with numerous objects: Deployments, Services, ConfigMaps, etc., how do you help your friend get to a similar state? Of course, you will share your YAML files with your friend. Correct?&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Yes and No!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You can do the hard work of copy and pasting for a small application, but what if your application is a full-stack web app with 100s of configuration files. Can you still copy them? No, you can’t, as it’s prone to errors with that many large numbers.&lt;/p&gt;

&lt;p&gt;Even if you send the object manifest to your friend, there’s one question still available. How would your friend convert  site to  site, or maybe change how much resource each application consumes? Will he go through the 100 manifests? No, that’s not scalable, prone to errors, and wastes a lot of time.&lt;/p&gt;

&lt;p&gt;Now suppose there’s a security flaw on one of your dependencies. How will you update them? Find your YAML, or edit your live resources? Absolutely NO!&lt;/p&gt;

&lt;p&gt;You need a saviour. You need Helm to your rescue.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What is Helm?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://helm.sh/"&gt;Helm&lt;/a&gt; is your saviour for the reproducibility problem, a package manager, and a CNCF graduate project. It was launched in 2016 and has seen massive adoption among organizations, individuals since then. Under the &lt;a href="https://www.cncf.io/"&gt;CNCF&lt;/a&gt; Umbrella, Helm has become the de facto package manager for your clusters.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What does Helm do?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Helm helps you to achieve reproducibility in the following ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Provides an easy way to deploy complex application&lt;/li&gt;
&lt;li&gt;Provides easy way to update specific values for your deployments&lt;/li&gt;
&lt;li&gt;Provides a way to version a particular package&lt;/li&gt;
&lt;li&gt;Provides a way to share your templates across organisation, Internet&lt;/li&gt;
&lt;li&gt;Provides an easy way to manage dependency&lt;/li&gt;
&lt;li&gt;Provides an easy way to rollback changes&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Architecture of Helm&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The Helm Repository contains all the charts (packages) created by you or other people that you can use to reach the desired state. The &lt;a href="https://helm.sh/docs/intro/install/"&gt;Helm CLI&lt;/a&gt; pulls the package, unarchives it, and then converts the charts to a valid YAML, which is then pushed to &lt;a href="https://kubernetes.io/docs/concepts/overview/kubernetes-api/"&gt;Kubernetes API&lt;/a&gt; server, which creates a release.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://community.ops.io/images/dywvBvVHsybHEEm_ebTkS84E6lWuc-FIfIE_BEXGD-0/w:880/mb:500000/ar:1/aHR0cHM6Ly9kZXYt/dG8tdXBsb2Fkcy5z/My5hbWF6b25hd3Mu/Y29tL3VwbG9hZHMv/YXJ0aWNsZXMvZGNk/OG41YXJ5dHVta28y/emI5djcucG5n" class="article-body-image-wrapper"&gt;&lt;img src="https://community.ops.io/images/dywvBvVHsybHEEm_ebTkS84E6lWuc-FIfIE_BEXGD-0/w:880/mb:500000/ar:1/aHR0cHM6Ly9kZXYt/dG8tdXBsb2Fkcy5z/My5hbWF6b25hd3Mu/Y29tL3VwbG9hZHMv/YXJ0aWNsZXMvZGNk/OG41YXJ5dHVta28y/emI5djcucG5n" alt="Helm Architecture" width="880" height="475"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Basic Components of Helm Charts&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;A basic Helm Chart has the following structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;package-name/

charts/

templates/

Chart.yaml

values.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;charts/&lt;/code&gt;: This directory can be used to store manually maintained chart dependencies.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;templates/&lt;/code&gt;: These contain the template files which would be used to create the final manifest after combining with &lt;code&gt;values.yaml&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Chart.yaml&lt;/code&gt;: This file contains information about the chart, such as the name and version of the chart, the maintainer, dependencies, a related website, and search terms.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;values.yaml&lt;/code&gt;: This contains the default configuration for your charts. You can edit this for updating values and remove the complexity of finding specific editable items in the different manifests.&lt;/p&gt;

&lt;p&gt;The below example shows a &lt;code&gt;deployent.yaml&lt;/code&gt; from &lt;code&gt;templates&lt;/code&gt; being rendered with the custom values from &lt;code&gt;values.yaml&lt;/code&gt; to produce the valid YAML.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://community.ops.io/images/arVQJBUPLP0V1ixItDgGzCIMTMvo-nOWxSo7rkGqD1s/w:880/mb:500000/ar:1/aHR0cHM6Ly9kZXYt/dG8tdXBsb2Fkcy5z/My5hbWF6b25hd3Mu/Y29tL3VwbG9hZHMv/YXJ0aWNsZXMvY3A0/OXZha2tyMWZmbnNt/djkwODEucG5n" class="article-body-image-wrapper"&gt;&lt;img src="https://community.ops.io/images/arVQJBUPLP0V1ixItDgGzCIMTMvo-nOWxSo7rkGqD1s/w:880/mb:500000/ar:1/aHR0cHM6Ly9kZXYt/dG8tdXBsb2Fkcy5z/My5hbWF6b25hd3Mu/Y29tL3VwbG9hZHMv/YXJ0aWNsZXMvY3A0/OXZha2tyMWZmbnNt/djkwODEucG5n" alt="Final Manifest Creation" width="880" height="715"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Helm Templates Creation with values.yaml&lt;/p&gt;

&lt;h2&gt;
  
  
  How to edit the default values?
&lt;/h2&gt;

&lt;p&gt;Manually pulling the charts and unzipping it to edit your &lt;code&gt;values.yaml&lt;/code&gt; is not that straightforward. We have &lt;a href="https://www.portainer.io/solutions/kubernetes-ui"&gt;Portainer&lt;/a&gt; here, which does all the heavy lifting for you and helps you to get straight to editing the default configuration.&lt;/p&gt;

&lt;p&gt;First Navigate to Helm from the Menu and then add a repository. In a standard installation of Helm, you need to add a repository, but with Portainer, you get &lt;a href="https://bitnami.com/stacks/helm"&gt;Bitnami&lt;/a&gt; by default and can add more when required. Select a namespace and an application name. Then you need to select the chart you want to deploy to your cluster.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://community.ops.io/images/j-smvtEsZAmrkXVak7qEFERSDpPR0PSru2omXtEDkdI/w:880/mb:500000/ar:1/aHR0cHM6Ly9kZXYt/dG8tdXBsb2Fkcy5z/My5hbWF6b25hd3Mu/Y29tL3VwbG9hZHMv/YXJ0aWNsZXMvOWJt/cGg5ZmsyYnlkMjIw/YjN3NnUuZ2lm" class="article-body-image-wrapper"&gt;&lt;img src="https://community.ops.io/images/j-smvtEsZAmrkXVak7qEFERSDpPR0PSru2omXtEDkdI/w:880/mb:500000/ar:1/aHR0cHM6Ly9kZXYt/dG8tdXBsb2Fkcy5z/My5hbWF6b25hd3Mu/Y29tL3VwbG9hZHMv/YXJ0aWNsZXMvOWJt/cGg5ZmsyYnlkMjIw/YjN3NnUuZ2lm" alt="Portainer Overview" width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once selected, you can navigate to a chart, and Portainer will load the values on your dashboard for you to edit. Editing values are straightforward and abstract the complexity you had to deal with manual installation and initialization via a simple GUI.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://community.ops.io/images/pfpvJGnVbPvSx3kO3ohK8vyuwGEK2t3ZJ-cd4iv77GA/w:880/mb:500000/ar:1/aHR0cHM6Ly9kZXYt/dG8tdXBsb2Fkcy5z/My5hbWF6b25hd3Mu/Y29tL3VwbG9hZHMv/YXJ0aWNsZXMvdXhu/dGN3aDM1MngwYWxq/d25lb3IucG5n" class="article-body-image-wrapper"&gt;&lt;img src="https://community.ops.io/images/pfpvJGnVbPvSx3kO3ohK8vyuwGEK2t3ZJ-cd4iv77GA/w:880/mb:500000/ar:1/aHR0cHM6Ly9kZXYt/dG8tdXBsb2Fkcy5z/My5hbWF6b25hd3Mu/Y29tL3VwbG9hZHMv/YXJ0aWNsZXMvdXhu/dGN3aDM1MngwYWxq/d25lb3IucG5n" alt="Editing Default Values" width="880" height="561"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;Install&lt;/code&gt; button would install your chart with the specified values to your cluster. If you want to go with the default values, click on &lt;code&gt;Install&lt;/code&gt; without editing the values.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://community.ops.io/images/B7H22nCZmKme9NcSUqTFb4H4gqcDL7S53H5OpXJ-Zg4/w:880/mb:500000/ar:1/aHR0cHM6Ly9kZXYt/dG8tdXBsb2Fkcy5z/My5hbWF6b25hd3Mu/Y29tL3VwbG9hZHMv/YXJ0aWNsZXMvMXVx/ZHkzMnB4MmoyNXls/aGhsaXoucG5n" class="article-body-image-wrapper"&gt;&lt;img src="https://community.ops.io/images/B7H22nCZmKme9NcSUqTFb4H4gqcDL7S53H5OpXJ-Zg4/w:880/mb:500000/ar:1/aHR0cHM6Ly9kZXYt/dG8tdXBsb2Fkcy5z/My5hbWF6b25hd3Mu/Y29tL3VwbG9hZHMv/YXJ0aWNsZXMvMXVx/ZHkzMnB4MmoyNXls/aGhsaXoucG5n" alt="Exposed Ports and Secrets" width="880" height="352"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After installation, Portainer detects and shows you &lt;code&gt;Published URLs&lt;/code&gt; to access your applications and secrets for you to access default passwords. Forget digging through commands to get to your Services and Secrets.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Helm abstract the complexity of installing applications to your cluster. Portainer abstracts the complexity of managing your cluster. This post went through how Portainer can help you simplify your Kubernetes workflows with Helms.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.portainer.io/pricing/take5"&gt;Try Portainer now&lt;/a&gt; and learn more about the different ways to streamline managing Kubernetes with our &lt;a href="https://docs.portainer.io/v/ce-2.11/user/kubernetes"&gt;documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Portainer Documentation for Helm here: &lt;a href="https://docs.portainer.io/v/ce-2.11/user/kubernetes/helm"&gt;https://docs.portainer.io/v/ce-2.11/user/kubernetes/helm&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Recordings are here:&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/3zXgLght57s"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>tutorials</category>
      <category>devops</category>
    </item>
    <item>
      <title>The DevOps Roadmap: Containerization, Containers and why do you need them?</title>
      <dc:creator>Hrittik Roy</dc:creator>
      <pubDate>Wed, 25 May 2022 20:03:40 +0000</pubDate>
      <link>https://community.ops.io/hrittikhere/the-devops-roadmap-containerization-containers-and-why-do-you-need-them-2177</link>
      <guid>https://community.ops.io/hrittikhere/the-devops-roadmap-containerization-containers-and-why-do-you-need-them-2177</guid>
      <description>&lt;p&gt;Containers are a popular term in the industry and help developers develop and deploy apps a lot faster. While by using virtualization, you can run various operating systems on your hardware containerization to run multiple instances or deploy multiple applications using the same operating system on a single virtual machine or server.&lt;/p&gt;

&lt;p&gt;This capacity to run multiple applications on the same resource summarizes into making our development life cycle efficient. In this post, we would summarize everything you must know about these unfamiliar words and how exactly containerization makes our life easy.&lt;/p&gt;

&lt;p&gt;Let’s dive in!&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Containerization?
&lt;/h2&gt;

&lt;p&gt;Containerization is simply packaging all the required environment, libraries, frameworks, directories, and application code to create a container. Citrix defined it as:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Containerization is defined as a form of operating system virtualization, through which applications are run in isolated user spaces called containers, all using the same shared operating system (OS). A container is essentially a fully packaged and portable computing environment that already has all the necessary packages to run the application inside.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://community.ops.io/images/2PXDekS-A5LAbIWksS2WZDnsONMfAt-6PsrW5z3XOdk/w:880/mb:500000/ar:1/aHR0cHM6Ly9kZXYt/dG8tdXBsb2Fkcy5z/My5hbWF6b25hd3Mu/Y29tL3VwbG9hZHMv/YXJ0aWNsZXMvb24w/bW1jbzdkZ2w5cXd3/aGI0ejAucG5n" class="article-body-image-wrapper"&gt;&lt;img src="https://community.ops.io/images/2PXDekS-A5LAbIWksS2WZDnsONMfAt-6PsrW5z3XOdk/w:880/mb:500000/ar:1/aHR0cHM6Ly9kZXYt/dG8tdXBsb2Fkcy5z/My5hbWF6b25hd3Mu/Y29tL3VwbG9hZHMv/YXJ0aWNsZXMvb24w/bW1jbzdkZ2w5cXd3/aGI0ejAucG5n" alt="Image description" width="880" height="582"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Simply put, the process of creating containers is called containerization.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What is a Container?
&lt;/h2&gt;

&lt;p&gt;The word container has its origin form contain, and it does the same thing, i.e., it contains everything you need to run an application.&lt;/p&gt;

&lt;p&gt;Containers packages and contains code and all its dependencies, so the application runs quickly, portably, and reliably from one computing environment to another.&lt;/p&gt;

&lt;h2&gt;
  
  
  How does a Container work?
&lt;/h2&gt;

&lt;p&gt;Containers run via a containerization engine over a single host operating system by sharing the operating system kernel with other containers. This sharing is achieved by limiting parts of the OS to a read-only mode. This sharing makes containers extremely lightweight on the resources as you don’t need to configure a new operating system for each new container.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://community.ops.io/images/jOnrd2RJwIwG7WL8sRIR3LFJ8jamhN9RRE4y-XjcGu0/w:880/mb:500000/ar:1/aHR0cHM6Ly9kZXYt/dG8tdXBsb2Fkcy5z/My5hbWF6b25hd3Mu/Y29tL3VwbG9hZHMv/YXJ0aWNsZXMveWVw/Ym0zY3djYTZqZXQx/ajF6Z2kucG5n" class="article-body-image-wrapper"&gt;&lt;img src="https://community.ops.io/images/jOnrd2RJwIwG7WL8sRIR3LFJ8jamhN9RRE4y-XjcGu0/w:880/mb:500000/ar:1/aHR0cHM6Ly9kZXYt/dG8tdXBsb2Fkcy5z/My5hbWF6b25hd3Mu/Y29tL3VwbG9hZHMv/YXJ0aWNsZXMveWVw/Ym0zY3djYTZqZXQx/ajF6Z2kucG5n" alt="Image description" width="880" height="703"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Docker a containerization engine running multiple containers. Image Credits: Docker&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;As you see, containers differ a lot from &lt;a href="https://web.archive.org/web/20211206070613/https://www.p3r.one/the-devops-roadmap-virtualization/"&gt;virtual environments&lt;/a&gt; (you’d need a separate OS per system here), and if you’d want to know more, we have put a blog discussing the core differences elaborately.&lt;/p&gt;

&lt;p&gt;We would also discuss why VMs are more secure than containers and other drawbacks in the same blog.🤯&lt;/p&gt;

&lt;h2&gt;
  
  
  Why do you need Containers?
&lt;/h2&gt;

&lt;p&gt;Development is a complicated task, and as a developer, you need to tackle issues and keep in mind many things while you develop. Containers help you keep your focus on the code and care less about other substances like the environment during development.&lt;/p&gt;

&lt;p&gt;A few reasons you should use a container to make your dev life easy are:&lt;/p&gt;

&lt;h3&gt;
  
  
  Consistent Environment
&lt;/h3&gt;

&lt;p&gt;Development is a critical task, and you must take into account a lot of factors ranging from libraries, frameworks to network configuration or directory management while deploying the application. Think about how different Linux and windows manage their directories.&lt;/p&gt;

&lt;p&gt;Problems arise when the supporting software environment is not identical, says Docker creator Solomon Hykes. “You’re going to test using Python 2.7, and then it’s going to run on Python 3 in production and something weird will happen. Or you’ll rely on the behavior of a certain version of an SSL library and another one will be installed. You’ll run your tests on Debian and production is on Red Hat and all sorts of weird things happen.”&lt;/p&gt;

&lt;p&gt;Containers have the opportunity for developers to build predictable environments isolated from other applications. The application’s software dependencies can also be contained in containers, such as particular versions of programming language runtimes and other software libraries.&lt;/p&gt;

&lt;p&gt;This all turns in favor of a developer. The focus stays on quality code and not on the bugs that might creep while migrating code from his system to the server.&lt;/p&gt;

&lt;h3&gt;
  
  
  Less overhead
&lt;/h3&gt;

&lt;p&gt;A container can be just tens of megabytes in size, while it may be several gigabytes in size for a virtual machine with its entire operating system. Because of this, far more containers than virtual machines will hold a single server, which boils down to consuming less resources for one individual container.&lt;/p&gt;

&lt;h3&gt;
  
  
  Just In Time
&lt;/h3&gt;

&lt;p&gt;It can take several minutes for virtual machines to boot up their operating systems and begin running their host applications. At the same time, it is possible to start containerized applications almost instantly. That implies that when they are needed, containers can be instantiated in a “just in time” mode and can disappear when they are no longer required, freeing up resources on their hosts.&lt;/p&gt;

&lt;h3&gt;
  
  
  Modularity
&lt;/h3&gt;

&lt;p&gt;Containers can split applications into modules (such as the database, the application front end, and so on) instead of running an entire complex application within a single container.&lt;/p&gt;

&lt;p&gt;This is the method of so-called microservices. It is simpler to handle applications designed in this way because each module is relatively easy, and improvements can be made to modules without the whole application needing to be rebuilt. Since containers are so lightweight, it is possible to instantiate individual modules (or microservices) only when required and almost immediately accessible.&lt;/p&gt;

&lt;h3&gt;
  
  
  Run Anywhere
&lt;/h3&gt;

&lt;p&gt;Containers can run almost anywhere, making development and deployment much easier: on Linux, Windows, and Mac operating systems; on virtual or bare metal machines; on the developer’s machine or on-site data centers; and, of course, in the public cloud.&lt;/p&gt;

&lt;p&gt;Container images like that of docker help enhance portability as they are widespread and well supported.&lt;/p&gt;

&lt;p&gt;You can make use of containers anywhere you want to run your apps.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Container Image?
&lt;/h2&gt;

&lt;p&gt;To rebuild a container, you need some file or a template that contains the instructions of what the replication must include.&lt;/p&gt;

&lt;p&gt;Container images are the same templates that help you to rebuild a container. The template consists of unchangeable static files that can be shared, and upon building, using the shared image yields a similar container.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://community.ops.io/images/BUvHC6LWLGuo0SxGHWKx-mtcI4EX3Cpu6eZm-WIvpso/w:880/mb:500000/ar:1/aHR0cHM6Ly9kZXYt/dG8tdXBsb2Fkcy5z/My5hbWF6b25hd3Mu/Y29tL3VwbG9hZHMv/YXJ0aWNsZXMvaTZ6/YjlueWF4c3h5Njlv/Zzl5amEucG5n" class="article-body-image-wrapper"&gt;&lt;img src="https://community.ops.io/images/BUvHC6LWLGuo0SxGHWKx-mtcI4EX3Cpu6eZm-WIvpso/w:880/mb:500000/ar:1/aHR0cHM6Ly9kZXYt/dG8tdXBsb2Fkcy5z/My5hbWF6b25hd3Mu/Y29tL3VwbG9hZHMv/YXJ0aWNsZXMvaTZ6/YjlueWF4c3h5Njlv/Zzl5amEucG5n" alt="Image description" width="880" height="436"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Docker Image is an example for Container Image used to build containers.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;An image comes in handy when you need to recreate a consistent working environment within another container without dealing with traditional long and boring manual configurations.&lt;/p&gt;

&lt;h2&gt;
  
  
  What are the popular Container Engines?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Docker
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.docker.com/"&gt;Docker’s &lt;/a&gt;open-source containerization engine is the first and still most popular container technology among various competitors. Docker works with most commercial/enterprise products, as well as many open-source tools.&lt;/p&gt;

&lt;p&gt;You can read more about docker and docker images in this curated beginner-friendly post.&lt;/p&gt;

&lt;h3&gt;
  
  
  CRI-O
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.docker.com/"&gt;CRI-O&lt;/a&gt;, a lightweight alternative to docker, allows you to run containers without any unnecessary code or configuration, directly from &lt;a href="//kubernetes.io"&gt;Kubernetes&lt;/a&gt;, a container management system.&lt;/p&gt;

&lt;h3&gt;
  
  
  Kata Containers
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://katacontainers.io/"&gt;Kata Containers&lt;/a&gt; is an open-source container runtime with lightweight virtual machines that feel and function like containers but use hardware virtualization technology as a second layer of protection to provide more robust workload isolation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Microsoft Containers
&lt;/h3&gt;

&lt;p&gt;Positioned as an alternative to Linux, &lt;a href="https://docs.microsoft.com/en-us/virtualization/windowscontainers/about/"&gt;Microsoft Containers&lt;/a&gt; supports Windows OS in very specific conditions. Typically, they run on a real virtual machine rather than a cluster manager like Kubernetes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts ⭐
&lt;/h2&gt;

&lt;p&gt;Containers are becoming essential as the direction of development is moving towards cloud-native. The advantages to using containers like the flexibility and agility are unmatched, and adoption is on the rise.&lt;/p&gt;

&lt;p&gt;I hope this blog helped you understand containerization and containers in-depth, and if you want to experiment with containers, you must know more about the best practices used by various companies of all sizes while adopting containers.&lt;/p&gt;

&lt;p&gt;Happy Containerizing!&lt;/p&gt;

</description>
      <category>devops</category>
      <category>cicd</category>
    </item>
  </channel>
</rss>
