In this post I want to share a simple, step-by-step approach to resize a volume in Kubernetes using blue-green approach.
Working with volumes in Kubernetes is tricky. Unlike pods, deployments, or ingresses, volumes behave in non-standard ways, depending on the provisioner
set in your StorageClass
– certain features might work with AWS EBS, but not in GCP or Azure (or vice versa).
For a full list of provisioners and their capabilities, check out the Kubernetes documentation. In this post, I focus on Amazon’s EBS (Elastic Block Storage) in a standard EKS (Elastic Kubernetes Service) distribution. In theory though, this approach should work with any other provisioner.
Kubernetes native volume expansion
Before we dive in to the example, let us first address the Kubernetes native volume expansion capabilities. It’d be nice to trust the orchestrator to do the heavy lifting for us. In reality, it’s not that simple (yet).
If you’re on a newer version of Kubernetes and don’t mind using beta functionality, there’s a support for automatic expansion of a PersistentVolumeClaim
. If you want to go this route, couple of notes:
- Your
StorageClass
needs to haveallowVolumeExpansion: true
(you can check this withkubectl describe storageclass
). - Your Kubernetes distribution has
ExpandInUsePersistentVolumes
feature gate enabled. - Expanding an in-use volume is possible once per 6 hours.
Here’s a link to Kubernetes documentation describing this in more detail.
It’s ok, but…
I’ve decided not to use the above method for several reasons:
- It requires me to trust a beta Kubernetes functionality. In my case, I wanted to resize a live volume used in a critical production application. Not exactly a place for beta-testing.
- The recovery path is uncertain. I’m too lazy to recreate the whole setup just for testing this (which is possible).
- In my case, the volume expansion flag was turned OFF.
I’ve settled on using a simple method without too much fireworks, where I can understand what is happening under the hood.
Below I describe how I’ve performed a volume resize step by step, with little risk, and the ability to restore to a previous steps’ state at any point in time.
Step 1: create a 2nd volume
In this example, I’m using StorageClass automatically provisioned by EKS: gp2
. If you’re using eksctl, chances are your setup is similar. This StorageClass’s provisioner automatically creates volumes to satisfy any new PersistentVolumeClaim
resource’s requirements.
I’m using helm
for managing Kubernetes resources for the application deployment. Let’s edit the PersistentVolumeClaim
for my sample app.
Here’s my pvc.yml
contents:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: "{{ .Values.volume.pvcName }}"
labels:
app: "{{ .Values.appName }}"
component: "{{ .Values.appName }}"
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "{{ .Values.volume.storage }}"
Now let’s add a second volume definition – volumeBlue
– to the app’s values.yml
file:
volume:
storage: 50Gi
pvcName: "media-files"
volumeBlue:
enabled: false
storage: 200Gi
pvcName: "media-files-blue"
After adding these values, you can confidently deploy the changes – note that we haven’t changed the original volume in the pvc.yml
in any way. Note the enabled: false
flag in volumeBlue:
section – we’ll set it up first and then enable.
Now let’s add the 2nd volume to the pvc.yml
:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: "{{ .Values.volume.pvcName }}"
labels:
app: "{{ .Values.appName }}"
component: "{{ .Values.appName }}"
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "{{ .Values.volume.storage }}"
{{- if .Values.volumeBlue.enabled }}
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: "{{ .Values.volumeBlue.pvcName }}"
labels:
app: "{{ .Values.appName }}"
component: "{{ .Values.appName }}"
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "{{ .Values.volumeBlue.storage }}"
{{- end }}
You can deploy the app again – these changes are not rebuilding your app’s container image and thus should be quick. The new volume sits behind enabled
flag and as of now, we’re not adding the new resource to Kubernetes yet. Note that using helm
you can rollback any step in a matter of seconds (helm history ..
, then helm rollback ..
).
Finally, let’s create the 2nd volume:
In values.yml
:
volumeBlue:
enabled: true
Deploy again. Now Kubernetes should schedule creation of a new EBS volume. To see what’s happening, you can use:
kubectl get pvc -A
kubectl get event -A
You might notice the volume is in a Pending
state. What happened? Kubernetes tries to be smart and it won’t provision the EBS volume until there’s a pod binding to that volume. Don’t worry about this. You can check the PVC details and events with kubectl describe pvc/media-files-blue -n <namespace>
.
Step 2: Bind the volume to a pod
Let’s mount our new volume to the app. In deployment.yml
:
volumes:
- name: media-files
persistentVolumeClaim:
claimName: {{ .Values.volume.pvcName }}
{{- if .Values.volumeBlue.enabled }}
- name: media-files-blue
persistentVolumeClaim:
claimName: {{ .Values.volumeBlue.pvcName }}
{{- end }}
Deploy now and see what happens. Now the EBS volume should be created and you should see the new PVC’s state switch to Bound
after a moment. This means Kubernetes has created the volume in AWS EBS and bound the claim.
But the volume is not used by our application yet. We need to add a volumeMount
first.
Note: we’re hiding the new volume behind the enabled
flag. This means that if anything goes wrong, we can run the deployment with enabled: false
and all our changes will be reversed.
deployment.yml
:
volumeMounts:
- name: media-files
mountPath: /opt/app/media-files
{{- if .Values.volumeBlue.enabled }}
- name: media-files-blue
mountPath: /opt/app/media-files-blue
{{- end }}
After releasing the above, the app should now have 2 directories with 2 separate volumes connected.
You can check this by exec
’ing into the running container via kubectl exec -it <pod_name> -c <container_name> -n <namespace> bash
(or sh
if the container doesn’t have bash).
Step 3: Sync the files
Alright, we now have 2 volumes: green (old), and blue (new). To switch, we need to make sure they have the same data.
To copy the files, we need to exec
into the running container, and run cp -r /opt/app/media-files/* /opt/app/media-files-blue/
.
In addition to that, it’s useful to create a file for each volume to distinguish them more easily later on: add __VOLUME_GREEN__
file to /opt/app/media-files/
and __VOLUME_BLUE__
to /opt/app/media-files-blue/
.
NOTE 1: If the app you’re working on is deployed frequently, and/or there’s a lot of data to sync, it’s possible that a deployment will shut down the container you’re running the cp -r
from. If you can’t guarantee a short non-deployment window, you can do it differently. Simply add an additional deployment
/pod
in the same namespace as the app (or even it the same deployment), and bind to the volumes using the same claimNames (EKS EBS Note: If you’re creating a separate pod, make sure it’s living on the same node as the app using nodeAffinity
or podAffinity
– you can’t mount an EBS volume to pods on 2 different nodes).
NOTE 2: If your app is writing to the volume a lot, the files added during the migration might not be copied to the new volume. To solve this issue, there are at least 2 options:
- Copy the files, then switch to the new volume, then copy the files again to account for the files created during the deployment switch (before deleting the old volume).
- Stop writing to the volume/put the app in maintenance mode for the duration of the switch.
Step 4: Switch volumes
Now the fun part.
Once all the files have been copied, we can now switch the volumes in deployment.yml
:
volumes:
- name: media-files
persistentVolumeClaim:
claimName: {{ .Values.volumeBlue.pvcName }}
{{- if .Values.volumeBlue.enabled }}
- name: media-files-blue
persistentVolumeClaim:
claimName: {{ .Values.volume.pvcName }}
{{- end }}
Deploy and you should now see that the app is using the new volume – media-files-blue
, whereas the directory name in the container itself did not change.
If you kubectl exec
into the container, you can confirm the change was done correctly by checking our flag files:
/opt/app/media-files/__VOLUME_BLUE__
/opt/app/media-files-blue/__VOLUME_GREEN__
This means everything went correctly. The app now uses the new, bigger EBS volume, with all the files from the previous one in place.
Notice that all the steps we’ve performed were easy to rollback using helm
, including the final one. If anything goes wrong, we can switch the volumes back.
Step 5: Cleanup
Depending on your situation, you might want to wait a day or two before the cleanup. If everything works well though, it’s now time to remove the green volume and leave the blue one.
First leave pvc.yml
unchanged. But we’ll change the values.yml
:
volume:
storage: 200Gi
pvcName: "media-files-blue"
volumeBlue:
enabled: true
storage: 50Gi
pvcName: "media-files"
We’ve switched the names above – this is an important step. This means our blue volume becomes the main one.
Now you should deploy the app yet again, and observe that nothing changed. Now the volumeBlue
is the old, green
one.
Then, let’s switch the volumeBlue.enabled
to false
and deploy – this should trigger the deletion/release of the old green volume.
And the final step, delete all the {{- if .Values.volumeBlue.enabled }}
code from pvc.yml
and deployment.yml
.
Summary
There should be a single, resized volume connected to the deployment now. Other than that, all the .yml
resources remain the same as they were before any of the changes.
There is a single important change, though. The PersistentVolumeClaim
now has a different name – media-files
changed to media-files-blue
. I’ve picked the name for this article’s purpose. In a real-world scenario, however, you might want to pick a more useful name (e.g. media-files-resize-1
).
If you need to keep the original PVC
name, you can repeat the procedure one more time and it’ll get you there.