Kopia Database Example
Kopia backup
Kopia is a fast, secure, and efficient backup program that supports encryption, compression, deduplication, and incremental backups. The following example will use Kopia to create a backup of a source volume.
A MySQL database will be used as the example application.
Creating source PVC to be backed up
Create a namespace called source
$ kubectl create ns source
$ kubectl annotate namespace source volsync.backube/privileged-movers="true"
Note
The second command to annotate the namespace is used to enable the kopia data mover to run in privileged mode.
This is because this simple example runs MySQL as root. For your own applications, you can run unprivileged by
setting the moverSecurityContext
in your ReplicationSource/ReplicationDestination to match that of your
application in which case the namespace annotation will not be required. See the
permission model documentation for more details.
Deploy the source MySQL database.
$ kubectl -n source create -f examples/source-database/
Verify the database is running:
$ kubectl -n source get pods,pvc,volumesnapshots
NAME READY STATUS RESTARTS AGE
pod/mysql-87f849f8c-n9j7j 1/1 Running 1 58m
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/mysql-pv-claim Bound pvc-adbf57f1-6399-4738-87c9-4c660d982a0f 2Gi RWO csi-hostpath-sc 60m
Add a new database:
$ kubectl exec --stdin --tty -n source $(kubectl get pods -n source | grep mysql | awk '{print $1}') -- /bin/bash
$ mysql -u root -p$MYSQL_ROOT_PASSWORD
> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| sys |
+--------------------+
4 rows in set (0.00 sec)
> create database synced;
> exit
$ exit
Kopia Repository Setup
For the purpose of this tutorial we are using minio as the object storage target for the backup.
Start minio
:
$ hack/run-minio.sh
The kopia-config
Secret configures the Kopia repository parameters:
---
apiVersion: v1
kind: Secret
metadata:
name: kopia-config
type: Opaque
stringData:
# The repository url
KOPIA_REPOSITORY: s3://kopia-repo
# The repository encryption password
KOPIA_PASSWORD: my-secure-kopia-password
# S3 credentials
AWS_ACCESS_KEY_ID: access
AWS_SECRET_ACCESS_KEY: password
# S3 endpoint (required for non-AWS S3)
AWS_S3_ENDPOINT: http://minio.minio.svc.cluster.local:9000
The above will backup to a bucket called kopia-repo
. For optimal deduplication
benefits, it is strongly recommended to use a single Kopia repository (single S3
bucket without prefixes) for all your PVCs. See Repository Configuration Best Practices
for more detail.
ReplicationSource with Database Consistency and Repository Policies
Start by configuring the source with database-specific consistency hooks and comprehensive repository policies. This example demonstrates using Kopia’s advanced features including retention policies, compression, and actions to ensure consistent MySQL backups:
---
apiVersion: volsync.backube/v1alpha1
kind: ReplicationSource
metadata:
name: database-source
namespace: source
spec:
sourcePVC: mysql-pv-claim
trigger:
schedule: "*/30 * * * *"
kopia:
repository: kopia-config
# Repository Retention Policy
# Define how many snapshots to keep at different intervals
retain:
hourly: 24 # Keep 24 hourly snapshots (1 day)
daily: 7 # Keep 7 daily snapshots (1 week)
weekly: 4 # Keep 4 weekly snapshots (1 month)
monthly: 6 # Keep 6 monthly snapshots
yearly: 1 # Keep 1 yearly snapshot
# Compression Configuration
# Use zstd for optimal balance of speed and compression ratio
compression: zstd
# Performance Tuning
# Use multiple parallel streams for faster uploads
parallelism: 2
# Database Consistency Actions
# These hooks ensure database consistency during backup
actions:
# Before snapshot: Create consistent database dump
beforeSnapshot: |
echo "Starting database backup at $(date)" >> /data/backup.log
mysqldump --single-transaction --routines --triggers --all-databases > /data/mysql-backup.sql
echo "Database dump completed" >> /data/backup.log
# After snapshot: Clean up temporary files
afterSnapshot: |
rm -f /data/mysql-backup.sql
echo "Cleanup completed at $(date)" >> /data/backup.log
# Use clone for point-in-time consistency
copyMethod: Clone
In the above ReplicationSource
object:
The PiT copy of the source data
mysql-pv-claim
will be created by cloning the source volume.The synchronization schedule,
.spec.trigger.schedule
, is defined by a cronspec, making the schedule very flexible. In this case, it will take a backup every 30 minutes.The kopia repository configuration is provided via the
kopia-config
Secret.
Repository Policy Features:
Retention Policy: The
retain
field defines a comprehensive retention policy:hourly: 24
: Keeps all hourly snapshots for the last 24 hoursdaily: 7
: Keeps one snapshot per day for the last 7 daysweekly: 4
: Keeps one snapshot per week for the last 4 weeksmonthly: 6
: Keeps one snapshot per month for the last 6 monthsyearly: 1
: Keeps one snapshot per year
This policy ensures recent changes are captured frequently while older data is retained with decreasing granularity to optimize storage usage.
Compression:
zstd
compression is enabled for optimal balance between compression ratio and speed. This typically reduces backup size by 50-70% for database dumps.Performance:
parallelism: 2
enables parallel upload streams for faster backup operations, especially beneficial for large databases.Maintenance: Repository maintenance should be configured using the KopiaMaintenance CRD (see below) to enforce retention policies and optimize repository storage.
Consistency Actions: The
actions
section defines hooks that run before and after snapshots:beforeSnapshot
: Creates a consistent SQL dump usingmysqldump --single-transaction
afterSnapshot
: Cleans up temporary files to avoid backing up unnecessary data
These actions ensure the backup captures a consistent database state even during active transactions.
Note
Database Consistency Best Practices:
The
beforeSnapshot
action usesmysqldump --single-transaction
to create a consistent backup without locking tablesThe
--routines
and--triggers
flags ensure stored procedures and triggers are included in the backupLogging timestamps helps track backup duration and troubleshoot issues
The SQL dump is cleaned up after snapshot to avoid storing redundant data
Tip
Policy Inheritance:
Repository policies are automatically inherited by all snapshots created from this ReplicationSource. The retention policy is evaluated during maintenance runs, automatically removing snapshots that exceed the defined retention limits. This ensures storage efficiency without manual intervention.
Configure KopiaMaintenance
Since the maintenanceIntervalDays
field has been removed from ReplicationSource, you need to create
a separate KopiaMaintenance resource to handle repository maintenance:
---
apiVersion: volsync.backube/v1alpha1
kind: KopiaMaintenance
metadata:
name: database-maintenance
namespace: source
spec:
repository:
repository: kopia-config # Same secret as ReplicationSource
trigger:
schedule: "0 2 * * 0" # Weekly on Sunday at 2 AM
# Cache configuration for improved performance
cacheCapacity: 5Gi
cacheStorageClassName: fast-ssd
cacheAccessModes:
- ReadWriteOnce
resources:
requests:
memory: "512Mi"
cpu: "200m"
limits:
memory: "2Gi"
cpu: "1"
This KopiaMaintenance resource will:
Run maintenance weekly on Sunday at 2 AM
Use a 5Gi persistent cache for improved performance
Enforce the retention policies defined in your ReplicationSource
Clean up orphaned data blocks and optimize the repository
Benefits of using KopiaMaintenance CRD:
Flexible scheduling: Use cron expressions or manual triggers
Performance optimization: Configure persistent cache for faster operations
Resource control: Set specific CPU and memory limits for maintenance
Independent operation: Maintenance runs separately from backup jobs
Now, deploy the kopia-config
, ReplicationSource
, and KopiaMaintenance
configurations.
$ kubectl create -f examples/kopia/source-kopia/source-kopia.yaml -n source
$ kubectl create -f examples/kopia/volsync_v1alpha1_replicationsource.yaml -n source
$ kubectl apply -f database-maintenance.yaml -n source
To verify the replication has completed, view the ReplicationSource
.status
field.
$ kubectl -n source get ReplicationSource/database-source -oyaml
apiVersion: volsync.backube/v1alpha1
kind: ReplicationSource
metadata:
name: database-source
namespace: source
spec:
# ... lines omitted ...
status:
conditions:
- lastTransitionTime: "2024-01-15T18:16:35Z"
message: Reconcile complete
reason: ReconcileComplete
status: "True"
type: Reconciled
lastSyncDuration: 2m45.123456789s
lastSyncTime: "2024-01-15T18:19:45Z"
nextSyncTime: "2024-01-15T18:30:00Z"
kopia:
lastMaintenance: "2024-01-15T12:00:00Z"
In the above output, the lastSyncTime
shows the time when the last backup
completed, and lastMaintenance
shows when maintenance was last run. The
maintenance operation enforces retention policies, removing old snapshots
according to the defined retention rules.
The backup created by VolSync can be seen by directly accessing the Kopia repository:
# In one window, create a port forward to access the minio server
$ kubectl port-forward --namespace minio svc/minio 9000:9000
# In another, access the repository with kopia via the above forward
$ export AWS_ACCESS_KEY_ID=access
$ export AWS_SECRET_ACCESS_KEY=password
$ export KOPIA_PASSWORD=my-secure-kopia-password
$ kopia repository connect s3 --bucket=kopia-repo --endpoint=http://127.0.0.1:9000
$ kopia snapshot list
Snapshots:
2024-01-15 18:19:45 UTC k8s-volsync@cluster 01234567890abcdef Path: /data Size: 1.2 GB
There is a snapshot in the kopia repository created by the kopia data mover.
Advanced Policy Configuration (Future Enhancement)
Warning
External policy file configuration requires mounting policy files via ConfigMap or Secret. The following example shows the planned functionality. Currently, use inline configuration options (retain, compression, actions) in the ReplicationSource spec.
For future complex policy requirements (not yet available):
---
apiVersion: v1
kind: ConfigMap
metadata:
name: database-kopia-policies
namespace: source
data:
global-policy.json: |
{
"retention": {
"keepLatest": 10,
"keepHourly": 48,
"keepDaily": 30,
"keepWeekly": 8,
"keepMonthly": 24,
"keepAnnual": 5
},
"compression": {
"compressor": "zstd",
"minSize": 1024,
"maxSize": 20971520
},
"actions": {
"beforeSnapshotRoot": [
{
"mode": "essential",
"script": "/scripts/pre-backup.sh",
"timeout": 300
}
],
"afterSnapshotRoot": [
{
"mode": "async",
"script": "/scripts/post-backup.sh"
}
]
},
"scheduling": {
"intervalSeconds": 3600,
"timesOfDay": ["02:00", "14:00", "22:00"]
},
"errorHandling": {
"ignoreFileErrors": true,
"ignoreDirectoryErrors": false
},
"files": {
"ignore": [
"*.tmp",
"*.swp",
"lost+found/",
".Trash*/"
],
"dotFiles": "include",
"oneFileSystem": true
}
}
repository.config: |
{
"enableActions": true,
"permittedActions": [
"beforeSnapshotRoot",
"afterSnapshotRoot"
]
}
---
apiVersion: volsync.backube/v1alpha1
kind: ReplicationSource
metadata:
name: database-source-advanced
namespace: source
spec:
sourcePVC: mysql-pv-claim
trigger:
schedule: "0 */2 * * *" # Every 2 hours
kopia:
repository: kopia-config
policyConfig:
configMapName: database-kopia-policies
globalPolicyFilename: "global-policy.json"
repositoryConfigFilename: "repository.config"
copyMethod: Clone
External Policy Benefits:
Fine-grained Control: Access to all Kopia policy settings
Complex Scheduling: Define multiple backup times per day
Advanced Filtering: Exclude specific file patterns from backups
Error Handling: Configure how to handle backup errors
Action Modes: Control action execution (essential, async, optional)
Size-based Compression: Only compress files within specific size ranges
Note
Current Status: External policy files via ConfigMap/Secret are not yet implemented. Use inline configuration options in the ReplicationSource spec for retention policies, compression settings (at repository creation), and snapshot actions.
Restoring the backup
To restore from the backup, create a destination, deploy kopia-config
and
ReplicationDestination
on the destination.
$ kubectl create ns dest
$ kubectl annotate namespace dest volsync.backube/privileged-movers="true"
$ kubectl -n dest create -f examples/kopia/source-kopia/
To start the restore, create an empty PVC for the data:
$ kubectl -n dest create -f examples/source-database/mysql-pvc.yaml
persistentvolumeclaim/mysql-pv-claim created
Create the ReplicationDestination in the dest
namespace to restore the data:
---
apiVersion: volsync.backube/v1alpha1
kind: ReplicationDestination
metadata:
name: database-destination
namespace: dest
spec:
trigger:
manual: restore
kopia:
destinationPVC: mysql-pv-claim
repository: kopia-config
copyMethod: Direct
# ⚠️ sourceIdentity REQUIRED because this is a cross-namespace restore
# (dest namespace ≠ source namespace)
# For same-namespace restores with matching names, sourceIdentity is optional
sourceIdentity:
sourceName: database-source # Source ReplicationSource name
sourceNamespace: source # Source namespace (different from dest)
# sourcePVCName is auto-discovered from the ReplicationSource
$ kubectl -n dest create -f examples/kopia/volsync_v1alpha1_replicationdestination.yaml
Once the restore is complete, the .status.lastManualSync
field will match
.spec.trigger.manual
.
To verify restore, deploy the MySQL database to the dest
namespace which will use the data that has
been restored from sourcePVC backup.
Create the Deployment, Service, and Secret.
$ kubectl create -n dest -f examples/destination-database/mysql-secret.yaml
$ kubectl create -n dest -f examples/destination-database/mysql-deployment.yaml
$ kubectl create -n dest -f examples/destination-database/mysql-service.yaml
Validate that the mysql pod is running within the environment.
$ kubectl get pods -n dest
NAME READY STATUS RESTARTS AGE
mysql-8b9c5c8d8-v6tg6 1/1 Running 0 38m
Connect to the mysql pod and list the databases to verify the synced database exists.
$ kubectl exec --stdin --tty -n dest $(kubectl get pods -n dest | grep mysql | awk '{print $1}') -- /bin/bash
$ mysql -u root -p$MYSQL_ROOT_PASSWORD
> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| synced |
| sys |
+--------------------+
5 rows in set (0.00 sec)
> exit
$ exit
Note
If the beforeSnapshot
action created a SQL dump file, you may also find
mysql-backup.sql
in the restored data. This dump can be used as an
additional recovery option or imported into a fresh database instance.
Repository Configuration Best Practices
Single Repository Approach (Recommended)
For optimal deduplication benefits, it is strongly recommended to use a single Kopia repository for all your PVCs. This means using a single S3 bucket (or other backend) without path prefixes for all your backups. This approach maximizes Kopia’s deduplication capabilities across all your data.
Why Use a Single Repository?
Maximum Deduplication: Kopia performs content-defined chunking and deduplication at the repository level. When all PVCs share the same repository, duplicate data blocks across different PVCs are stored only once, significantly reducing storage costs.
Simplified Management: Managing one repository is simpler than managing multiple repositories with different paths or buckets.
Better Storage Efficiency: Common data patterns (like operating system files, application binaries, or shared libraries) are deduplicated across all your backups.
Automatic Isolation: Kopia internally manages separation between different PVCs using the username/hostname combination. Each ReplicationSource automatically gets a unique identity, ensuring complete isolation of snapshot histories.
How Kopia Manages Multiple PVCs in One Repository
Kopia uses a combination of username and hostname to create unique identities for each backup source. VolSync automatically generates these identities based on:
Username: Derived from the ReplicationSource name and namespace
Hostname: Defaults to the namespace name
This means each PVC backup has its own isolated snapshot history within the shared repository, while still benefiting from cross-PVC deduplication.
Recommended Configuration for Multiple PVCs
When backing up multiple PVCs to the same repository, use the same repository configuration (same S3 bucket, no path prefixes) but with different secret names:
For pvc-a
:
---
# Shared Kopia repository configuration (RECOMMENDED APPROACH)
# Use the SAME repository URL for all PVCs - no path prefixes!
apiVersion: v1
kind: Secret
metadata:
name: kopia-config-shared
namespace: source
type: Opaque
stringData:
# Single repository URL - no path prefix for optimal deduplication
KOPIA_REPOSITORY: s3://kopia-repo
# Single repository encryption password for all PVCs
KOPIA_PASSWORD: my-secure-kopia-password
# S3 credentials
AWS_ACCESS_KEY_ID: access
AWS_SECRET_ACCESS_KEY: password
# S3 endpoint (required for non-AWS S3)
AWS_S3_ENDPOINT: http://minio.minio.svc.cluster.local:9000
---
# ReplicationSource for pvc-a
apiVersion: volsync.backube/v1alpha1
kind: ReplicationSource
metadata:
name: app-database # Unique name creates unique identity
namespace: source
spec:
sourcePVC: pvc-a
trigger:
schedule: "*/30 * * * *"
kopia:
repository: kopia-config-shared # Use shared repository
retain:
daily: 7
weekly: 4
monthly: 6
yearly: 1
compression: zstd
parallelism: 2
copyMethod: Clone
# Identity automatically generated as:
# username: app-database-source
# hostname: source
# Full identity: app-database-source@source
For pvc-b
:
---
# ReplicationSource for pvc-b (using the SAME repository)
apiVersion: volsync.backube/v1alpha1
kind: ReplicationSource
metadata:
name: app-uploads # Different name ensures unique identity
namespace: source
spec:
sourcePVC: pvc-b
trigger:
schedule: "*/30 * * * *"
kopia:
repository: kopia-config-shared # SAME shared repository
retain:
daily: 7
weekly: 4
monthly: 6
yearly: 1
compression: zstd
parallelism: 2
copyMethod: Clone
# Identity automatically generated as:
# username: app-uploads-source
# hostname: source
# Full identity: app-uploads-source@source
Note
Key Benefits of Single Repository:
Kopia safely supports multiple clients writing to the same repository simultaneously
Each ReplicationSource maintains its own isolated snapshot history
Deduplication works across ALL PVCs in the repository
Storage savings can be significant when backing up similar data
When to Use Separate Repositories
While a single repository is recommended for most use cases, there are specific scenarios where separate repositories (different buckets or path prefixes) might be appropriate:
Compliance Requirements: Different data classifications requiring physical separation
HIPAA-regulated healthcare data vs. general application data
PCI-DSS payment card data vs. non-sensitive data
GDPR-protected personal data with different retention requirements
Organizational Boundaries: Clear separation between departments or teams
Different departments with separate budgets and storage accounts
Multi-tenant SaaS environments with strict isolation requirements
Separate development, staging, and production environments
Different Retention Policies: Incompatible backup retention requirements
Long-term archival data (years) vs. short-term operational backups (days)
Legal hold requirements for specific datasets
Performance Isolation: Preventing one workload from impacting another
High-frequency backup jobs vs. occasional large backups
Critical production systems vs. non-critical development work
Geographic Requirements: Data residency and latency considerations
Data that must remain in specific regions for compliance
Optimizing for regional performance by using local storage
Example: Using Separate Repositories When Necessary
If you must use separate repositories (e.g., for compliance), use distinct bucket paths:
---
# Repository for HIPAA-compliant healthcare data
apiVersion: v1
kind: Secret
metadata:
name: kopia-config-healthcare
type: Opaque
stringData:
KOPIA_REPOSITORY: s3://backups-hipaa/healthcare-data
KOPIA_PASSWORD: healthcare-encryption-key
# ... other credentials
---
# Repository for general application data
apiVersion: v1
kind: Secret
metadata:
name: kopia-config-general
type: Opaque
stringData:
KOPIA_REPOSITORY: s3://backups-general/app-data
KOPIA_PASSWORD: general-encryption-key
# ... other credentials
Warning
Using separate repositories means you lose deduplication benefits between them. Only separate repositories when you have a clear requirement to do so.
Understanding Deduplication Benefits
To illustrate why a single repository is recommended, consider this example:
Scenario: Backing up 10 application PVCs, each containing: - 500 MB of operating system libraries - 200 MB of common application frameworks - 300 MB of unique application data
With Separate Repositories (bucket prefixes per PVC): - Total storage used: 10 × (500 + 200 + 300) = 10,000 MB - No deduplication between PVCs
With Single Repository (recommended approach): - Common OS libraries stored once: 500 MB - Common frameworks stored once: 200 MB - Unique data for all apps: 10 × 300 = 3,000 MB - Total storage used: 500 + 200 + 3,000 = 3,700 MB - Storage savings: 63%
The savings increase dramatically when: - You have many PVCs with similar base images - Applications share common libraries or frameworks - You’re backing up multiple instances of the same application - Development, staging, and production environments have similar data
Tip
Monitor your Kopia repository statistics to see actual deduplication ratios. It’s common to see 50-80% storage reduction in environments with similar workloads.
Kopia Advantages for Database Backups
Kopia provides several advantages for database backups:
Consistency Actions: The beforeSnapshot
and afterSnapshot
actions ensure
database consistency without requiring application downtime.
Efficient Compression: Kopia’s zstd compression typically achieves better compression ratios than traditional backup tools, reducing storage costs.
Incremental Backups: Kopia’s content-defined chunking provides efficient incremental backups that only transfer changed data blocks.
Concurrent Access: Multiple backup sources can safely write to the same repository, making it easier to manage centralized backup infrastructure.
Fast Restores: Kopia’s architecture enables fast partial and full restores without needing to download entire backup archives.