Backup
Ensuring business continuity in Kubernetes deployments requires a robust backup and disaster recovery strategy. This is vital for safeguarding critical data and configurations.
For cloud-managed Kubernetes services like Google Kubernetes Engine (GKE), providers offer integrated backup capabilities, leveraging their native snapshotting and storage solutions.
Alternatively, for both cloud-based and on-premises clusters, third-party tools such as Velero provide a flexible, platform-agnostic approach to the backup and restoration of Kubernetes resources and persistent volumes.
Google Kubernetes Engine
For Google Kubernetes Engine (GKE), Google Cloud offers a dedicated service called Backup for GKE. This managed service enables users to backup and restore both Kubernetes resource manifests (cluster state) and persistent volume data.
Enable Backup for GKE for an existing cluster
Enable the backup api (official doc):
gcloud services enable gkebackup.googleapis.com \ --project <PROJECT_ID>
Enable the backup addon in the cluster (official doc):
gcloud container clusters update <CLUSTER_NAME> \ --project=<PROJECT_ID> \ --region=<REGION> \ --update-addons=BackupRestore=ENABLED
Create a schedule that will backup all namespaces every day at 3:00 AM UTC, with a retention of 7 days (official doc):
gcloud beta container backup-restore backup-plans create <BACKUP_PLAN_NAME> \ --project=<PROJECT_ID> \ --location=<REGION> \ --cluster="projects/<PROJECT_ID>/locations/<LOCATION>/clusters/<CLUSTER_ID>" \ --cron-schedule="0 3 * * *" \ --backup-retain-days=7 \ --all-namespaces
You can also specify --include-volume-data
or --include-secrets
to include persistent volumes or secrets in the backup plan:
gcloud beta container backup-restore backup-plans create <BACKUP_PLAN_NAME> \ --project=<PROJECT_ID> \ --location=<REGION> \ --cluster="projects/<PROJECT_ID>/locations/<LOCATION>/clusters/<CLUSTER_ID>" \ --cron-schedule="0 3 * * *" \ --backup-retain-days=7 \ --all-namespaces \ --include-secrets \ --include-volume-data
Velero
Velero is an open-source tool for backing up and restoring Kubernetes cluster resources and persistent volumes, enabling disaster recovery, and migrating workloads. It is widely used in production and is part of the CNCF cloud native landscape.
Velero consists of:
- A server that runs on your cluster
- A command-line client that runs locally
The official documentation is available here.
Prerequisites
Before you begin, make sure you have:
- A running Kubernetes cluster (version 1.16 or later).
kubectl
installed and configured to communicate with your cluster.- Access to an object storage bucket (e.g., GCP Cloud Storage, AWS S3, …) where your backups will be stored
Install the Velero CLI
The Velero command-line interface (CLI) is used to interact with the Velero server deployed in your cluster.
Download the latest release’s tarball corresponding to your operating system and desired Velero version:
wget https://github.com/vmware-tanzu/velero/releases/download/v<VERSION>/velero-v<VERSION>-<OS>-<ARCH>.tar.gz
Extract the tarball:
tar -xvf velero-v<VERSION>-<OS>-<ARCH>.tar.gz
Move the extracted velero binary to somewhere in your $PATH (/usr/local/bin for most users):
sudo mv velero-v<VERSION>-<OS>-<ARCH>/velero /usr/local/bin
Verify the installation, this should display the Velero client version:
velero version --client-only
Install and configure the server components
Velero uses storage provider plugins to integrate with a variety of storage systems to support backup and snapshot operations. The steps to install and configure the server components along with the appropriate plugins are specific to the chosen storage provider. Below is an example for AWS S3.
Create a file named credentials-velero with your object storage access keys.
cat > credentials-velero <<EOF[default]aws_access_key_id=<AWS_ACCESS_KEY_ID>aws_secret_access_key=<AWS_SECRET_ACCESS_KEY>EOF
Install Velero on Kubernetes
velero install \ --provider aws \ --plugins velero/velero-plugin-for-aws:v1.10.0 \ --bucket <BUCKET> \ --secret-file ./credentials-velero \ --backup-location-config region=<REGION> \ --snapshot-location-config region=<REGION> \ --kubeconfig ~/.kube/config
Verify Velero installation in the cluster:
kubectl get pods -n velero
Backup
Backup the entire cluster (all namespaces and cluster-scoped resources):
velero backup create cluster-backup --include-cluster-resources=true
Backup only a specific namespace:
velero backup create app-backup --include-namespaces app-namespace
Velero allows you to schedule recurring backups using a cron expression:
velero schedule create daily-backup --schedule="0 3 * * *" --include-cluster-resources=true
This creates a schedule named daily-backup
that will run every day at 3:00 AM UTC, backing up the entire cluster.
You can modify the --schedule
flag using standard cron syntax.
See the backup reference for all options.
Restore
Get a list of available backups:
velero backup get
The output should be like the following:
NAME STATUS STARTED COMPLETED EXPIRES STORAGE LOCATION SELECTORfull-cluster-backup Completed 2025-06-23 10:00:00 +0000 UTC 2025-06-23 10:05:00 +0000 UTC 2025-07-23 10:00:00 +0000 UTC default <none>app-backup Completed 2025-06-24 14:30:00 +0000 UTC 2025-06-24 14:31:00 +0000 UTC 2025-07-24 14:30:00 +0000 UTC default <none>
Create a restore operation:
velero restore create --from-backup <BACKUP_NAME>
If your backup contains multiple namespaces, you can choose to restore only a subset:
velero restore create --from-backup <BACKUP_NAME> --include-namespaces app1-namespace,app2-namespace
Monitor the restore process:
velero restore getvelero restore describe <RESTORE_NAME>velero restore logs <RESTORE_NAME>
See the restore reference for detailed options.