Basics of Kubernetes Volumes – Part 1

We continue our “Kubernetes in a Nutshell” journey and this part will cover Kubernetes Volumes! You will learn about:

Overview of Volumes and why they are needed
How to use a Volume
Hands-on example to help explore Volumes practically

The code is available on GitHub

Happy to get your feedback via Twitter or just drop a comment!

Pre-requisites:

You are going to need minikube and kubectl.

Install minikube as a single-node Kubernetes cluster in a virtual machine on your computer. On a Mac, you can simply:

curl -Lo minikube https://storage.googleapis.com/minikube/releases/latest/minikube-darwin-amd64 \
  && chmod +x minikube

sudo mv minikube /usr/local/bin

Install kubectl to interact with yur AKS cluster. On a Mac, you can simply:

curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/darwin/amd64/kubectl
chmod +x ./kubectl
sudo mv ./kubectl /usr/local/bin/kubectl

Overview

Data stored in Docker containers is ephemeral i.e. it only exists until the container is alive. Kubernetes can restart a failed or crashed container (in the same Pod), but you will still end up losing any data which you might have stored in the container filesystem. Kubernetes solves this problem with the help of Volumes. It supports many types of Volumes including external cloud storage (e.g. Azure Disk, Amazon EBS, GCE Persistent Disk etc.), networked file systems such as Ceph, GlusterFS etc. and others options like emptyDir, hostPath, local, downwardAPI, secret, config etc.

How are Volumes used?

Using a Volume is relatively straightforward – look at this partial Pod spec as an example

spec:
  containers:
  - name: kvstore
    image: abhirockzz/kvstore:latest
    volumeMounts:
    - mountPath: /data
      name: data-volume
    ports:
    - containerPort: 8080
  volumes:
    - name: data-volume
      emptyDir: {}

Notice the following:

spec.volumes – declares the available volume(s), its name (e.g. data-volume) and other (volume) specific characteristics e.g. in this case, its points to an Azure Disk
spec.containers.volumeMounts – it points to a volume declared in spec.volumes (e.g. data-volume) and specifies exactly where it wants to mount the that volume within the container file system (e.g. /data).

A Pod can have more than one Volume declared in spec.volumes. Each of these Volumes is accessible to all containers in the Pod but it’s not mandatory for all the containers to mount or make use of all the volumes. If needed, a container within the Pod can mount more than one volume into different paths in its file system. Also, different containers can possibly mount a single volume at the same time.

Another way of categorizing Volumes

I like to divide them as:

Emphemeral – Volumes which are tightly coupled with the Pod lifetime (e.g. emptyDir volume) i.e. they are deleted if the Pod is removed (for any reason).
Persistent – Volumes which are meant for long term storage and independent of the Pod or the Node lifecycle. This could be NFS or cloud based storage in case of managed Kubernetes offerings such as Azure Kubernetes Service, Google Kubernetes Engine etc.

Let’s look at emptyDir as an example

emptyDir volume in action

An emptyDir volume starts out empty (hence the name!) and is ephemeral in nature i.e. exists only as long as the Pod is alive. Once the Pod is deleted, so is the emptyDir data. It is quite useful in some scenarios/requirements such as a temporary cache, shared storage for multiple containers in a Pod etc.

To run this example, we will use a naive, over-simplified key-value store that exposes REST APIs for

adding key value pairs
reading the value for a key

Here is the code if you’re interested

Initial deployment

Start minikube if already not running

minikube start

Deploy the kvstore application. This will simply create a Deployment with one instance (Pod) of the application along with a NodePort service

kubectl apply -f https://raw.githubusercontent.com/abhirockzz/kubernetes-in-a-nutshell/master/volumes-1/kvstore.yaml

To keep things simple, the YAML file is being referenced directly from the GitHub repo, but you can also download the file to your local machine and use it in the same way.

Confirm they have been created

kubectl get deployments kvstore

NAME      READY   UP-TO-DATE   AVAILABLE   AGE
kvstore   1/1     1            1           28s

kubectl get pods -l app=kvstore

NAME                       READY   STATUS    RESTARTS   AGE
kvstore-6c94877886-gzq25   1/1     Running   0          40s

It’s ok if you do not know what a NodePort service is – it will be covered in a subsequent blog post. For the time being, just understand that it is a way to access our app (REST endpoint in this case)

Check the value of the random port generated by the NodePort service – You might see a result similar to this (with different IPs, ports)

kubectl get service kvstore-service

NAME              TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
kvstore-service   NodePort   10.106.144.48   <none>        8080:32598/TCP   5m

Check the PORT(S) column to find out the random port e.g. it is 32598 in this case (8080 is the internal port within the container exposed by our app – ignore it)

Now, you just need the IP of your minikube node using minikube ip

This might return something like 192.168.99.100 if you’re using a VirtualBox VM

In the commands that follow replace host with the minikube VM IP and port with the random port value

Create a couple of new key-value pair entries

curl http://[host]:[port]/save -d 'foo=bar'
curl http://[host]:[port]/save -d 'mac=cheese'

e.g.

curl http://192.168.99.100:32598/save -d 'foo=bar'
curl http://192.168.99.100:32598/save -d 'mac=cheese'

Access the value for key foo

curl http://[host]:[port]/read/foo

You should get the value you had saved for foo – bar. Same applies for mac i.e. you’ll get cheese as its value. The program saves the key-value data in /data – let’s confirm that by peeking directly into the Docker container inside the Pod

kubectl exec <pod name> -- ls /data/

foo
mac

foo, mac are individual files named after the keys. If we dig in further, we should be able to confirm thier respective values as well

To confirm value for the key mac

kubectl exec <pod name> -- cat /data/mac`

cheese

As expected, you got cheese as the answer since that’s what you had stored earlier. If you try to look for a key which you haven’t store yet, you’ll get an error

cat: can't open '/data/moo': No such file or directory
command terminated with exit code 1

Kill the container 😉

Alright, so far so good! Using a Volume ensures that the data will be preserved across container restarts/crash. Let’s ‘cheat’ a bit and manually kill the Docker container.

kubectl exec [pod name] -- ps

PID   USER     TIME  COMMAND
  1   root     0:00 /kvstore
  31 root      0:00  ps

Notice the process ID for the kvstore application (should be 1)

In a different terminal, set a watch on the Pods

kubectl get pods -l app=kvstore --watch

We kill our app process

kubectl exec [pod name] -- kill 1

You will notice that the Pod will transition through a few phases (like Error etc.) before going back to Running state (re-started by Kubernetes).

NAME                       READY     STATUS    RESTARTS   AGE
kvstore-6c94877886-gzq25   1/1       Running   0         15m
kvstore-6c94877886-gzq25   0/1       Error     0         15m
kvstore-6c94877886-gzq25   1/1       Running   1         15m

Execute kubectl exec <pod name> -- ls /data to confirm that the data in fact survived inspite of the container restart.

Delete the Pod!

But the data will not survive beyond the Pod’s lifetime. To confirm this, let’s delete the Pod manually

kubectl delete pod -l app=kvstore

You should see a confirmation such as below

pod "kvstore-6c94877886-gzq25" deleted

Kubernetes will restart the Pod again. You can confirm the same after a few seconds

kubectl get pods -l app=kvstore

you should see a new Pod in Running state

Get the pod name and peek into the file again

kubectl get pods -l app=kvstore
kubectl exec [pod name] -- ls /data/store

As expected, the /data/ directory will be empty!

The need for persistent storage

Simple (ephemeral) Volumes live and die with the Pod – but this is not going to suffice for a majority of applications. In order to be resilient, reliable, available and scalable, Kubernetes applications need to be able to run as multiple instances across Pods and these Pods themselves might be scheduled or placed across different Nodes in your Kubernetes cluster. What we need is a stable, persistent store which outlasts the Pod or even the Node on which the Pod is running.

As mentioned in the beginning of this blog, it’s simple to use a Volume – not just temporary ones like the one we just saw, but even long term persistent stores.

Here is a (contrived) example of how to use Azure Disk as a storage medium for your apps deployed to Azure Kubernetes Service.

apiVersion: v1
kind: Pod
metadata:
  name: testpod
spec:
  volumes:
  - name: logs-volume
    azureDisk:
          kind: Managed
          diskName: myAKSDiskName
          diskURI: myAKSDiskURI
  containers:
  - image: myapp-docker-image
    name: myapp
    volumeMounts:
    - mountPath: /app/logs
      name: logs-volume

So that’s it? Not quite! 😉 There are limitations to this approach. This and much more will be discussed in the next part of the series – so stay tuned!

I really hope you enjoyed and learned something from this article 😃😃 Please like and follow if you did!