=======================
Kopia Database Example
=======================
Kopia backup
============
`Kopia `_ is a fast, secure, and efficient backup program that
supports encryption, compression, deduplication, and incremental backups. The
following example will use Kopia to create a backup of a source volume.
A MySQL database will be used as the example application.
Creating source PVC to be backed up
-----------------------------------
Create a namespace called ``source``
.. code-block:: console
$ kubectl create ns source
$ kubectl annotate namespace source volsync.backube/privileged-movers="true"
.. note::
The second command to annotate the namespace is used to enable the kopia data mover to run in privileged mode.
This is because this simple example runs MySQL as root. For your own applications, you can run unprivileged by
setting the ``moverSecurityContext`` in your ReplicationSource/ReplicationDestination to match that of your
application in which case the namespace annotation will not be required. See the
:doc:`permission model documentation ` for more details.
Deploy the source MySQL database.
.. code:: console
$ kubectl -n source create -f examples/source-database/
Verify the database is running:
.. code-block:: console
$ kubectl -n source get pods,pvc,volumesnapshots
NAME READY STATUS RESTARTS AGE
pod/mysql-87f849f8c-n9j7j 1/1 Running 1 58m
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/mysql-pv-claim Bound pvc-adbf57f1-6399-4738-87c9-4c660d982a0f 2Gi RWO csi-hostpath-sc 60m
Add a new database:
.. code-block:: console
$ kubectl exec --stdin --tty -n source $(kubectl get pods -n source | grep mysql | awk '{print $1}') -- /bin/bash
$ mysql -u root -p$MYSQL_ROOT_PASSWORD
> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| sys |
+--------------------+
4 rows in set (0.00 sec)
> create database synced;
> exit
$ exit
Kopia Repository Setup
----------------------
For the purpose of this tutorial we are using minio as the object storage target
for the backup.
Start ``minio``:
.. code-block:: console
$ hack/run-minio.sh
The ``kopia-config`` Secret configures the Kopia repository parameters:
.. code-block:: yaml
---
apiVersion: v1
kind: Secret
metadata:
name: kopia-config
type: Opaque
stringData:
# The repository url
KOPIA_REPOSITORY: s3://kopia-repo
# The repository encryption password
KOPIA_PASSWORD: my-secure-kopia-password
# S3 credentials
AWS_ACCESS_KEY_ID: access
AWS_SECRET_ACCESS_KEY: password
# S3 endpoint (required for non-AWS S3)
AWS_S3_ENDPOINT: http://minio.minio.svc.cluster.local:9000
The above will backup to a bucket called ``kopia-repo``. For optimal deduplication
benefits, it is **strongly recommended** to use a single Kopia repository (single S3
bucket without prefixes) for all your PVCs. See `Repository Configuration Best Practices`_
for more detail.
ReplicationSource with Database Consistency and Repository Policies
--------------------------------------------------------------------
Start by configuring the source with database-specific consistency hooks and comprehensive
repository policies. This example demonstrates using Kopia's advanced features including
retention policies, compression, and actions to ensure consistent MySQL backups:
.. code-block:: yaml
---
apiVersion: volsync.backube/v1alpha1
kind: ReplicationSource
metadata:
name: database-source
namespace: source
spec:
sourcePVC: mysql-pv-claim
trigger:
schedule: "*/30 * * * *"
kopia:
repository: kopia-config
# Repository Retention Policy
# Define how many snapshots to keep at different intervals
retain:
hourly: 24 # Keep 24 hourly snapshots (1 day)
daily: 7 # Keep 7 daily snapshots (1 week)
weekly: 4 # Keep 4 weekly snapshots (1 month)
monthly: 6 # Keep 6 monthly snapshots
yearly: 1 # Keep 1 yearly snapshot
# Compression Configuration
# Use zstd for optimal balance of speed and compression ratio
compression: zstd
# Performance Tuning
# Use multiple parallel streams for faster uploads
parallelism: 2
# Database Consistency Actions
# These hooks ensure database consistency during backup
actions:
# Before snapshot: Create consistent database dump
beforeSnapshot: |
echo "Starting database backup at $(date)" >> /data/backup.log
mysqldump --single-transaction --routines --triggers --all-databases > /data/mysql-backup.sql
echo "Database dump completed" >> /data/backup.log
# After snapshot: Clean up temporary files
afterSnapshot: |
rm -f /data/mysql-backup.sql
echo "Cleanup completed at $(date)" >> /data/backup.log
# Use clone for point-in-time consistency
copyMethod: Clone
In the above ``ReplicationSource`` object:
- The PiT copy of the source data ``mysql-pv-claim`` will be created by cloning
the source volume.
- The synchronization schedule, ``.spec.trigger.schedule``, is defined by a
`cronspec `_, making the schedule
very flexible. In this case, it will take a backup every 30 minutes.
- The kopia repository configuration is provided via the ``kopia-config`` Secret.
**Repository Policy Features:**
- **Retention Policy**: The ``retain`` field defines a comprehensive retention policy:
- ``hourly: 24``: Keeps all hourly snapshots for the last 24 hours
- ``daily: 7``: Keeps one snapshot per day for the last 7 days
- ``weekly: 4``: Keeps one snapshot per week for the last 4 weeks
- ``monthly: 6``: Keeps one snapshot per month for the last 6 months
- ``yearly: 1``: Keeps one snapshot per year
This policy ensures recent changes are captured frequently while older data
is retained with decreasing granularity to optimize storage usage.
- **Compression**: ``zstd`` compression is enabled for optimal balance between
compression ratio and speed. This typically reduces backup size by 50-70%
for database dumps.
- **Performance**: ``parallelism: 2`` enables parallel upload streams for
faster backup operations, especially beneficial for large databases.
- **Maintenance**: Repository maintenance should be configured using the KopiaMaintenance CRD
(see below) to enforce retention policies and optimize repository storage.
- **Consistency Actions**: The ``actions`` section defines hooks that run
before and after snapshots:
- ``beforeSnapshot``: Creates a consistent SQL dump using ``mysqldump --single-transaction``
- ``afterSnapshot``: Cleans up temporary files to avoid backing up unnecessary data
These actions ensure the backup captures a consistent database state even
during active transactions.
.. note::
**Database Consistency Best Practices:**
- The ``beforeSnapshot`` action uses ``mysqldump --single-transaction`` to
create a consistent backup without locking tables
- The ``--routines`` and ``--triggers`` flags ensure stored procedures and
triggers are included in the backup
- Logging timestamps helps track backup duration and troubleshoot issues
- The SQL dump is cleaned up after snapshot to avoid storing redundant data
.. tip::
**Policy Inheritance:**
Repository policies are automatically inherited by all snapshots created
from this ReplicationSource. The retention policy is evaluated during
maintenance runs, automatically removing snapshots that exceed the defined
retention limits. This ensures storage efficiency without manual intervention.
Configure KopiaMaintenance
--------------------------
Since the ``maintenanceIntervalDays`` field has been removed from ReplicationSource, you need to create
a separate KopiaMaintenance resource to handle repository maintenance:
.. code-block:: yaml
---
apiVersion: volsync.backube/v1alpha1
kind: KopiaMaintenance
metadata:
name: database-maintenance
namespace: source
spec:
repository:
repository: kopia-config # Same secret as ReplicationSource
trigger:
schedule: "0 2 * * 0" # Weekly on Sunday at 2 AM
# Cache configuration for improved performance
cacheCapacity: 5Gi
cacheStorageClassName: fast-ssd
cacheAccessModes:
- ReadWriteOnce
resources:
requests:
memory: "512Mi"
cpu: "200m"
limits:
memory: "2Gi"
cpu: "1"
This KopiaMaintenance resource will:
- Run maintenance weekly on Sunday at 2 AM
- Use a 5Gi persistent cache for improved performance
- Enforce the retention policies defined in your ReplicationSource
- Clean up orphaned data blocks and optimize the repository
**Benefits of using KopiaMaintenance CRD:**
- **Flexible scheduling**: Use cron expressions or manual triggers
- **Performance optimization**: Configure persistent cache for faster operations
- **Resource control**: Set specific CPU and memory limits for maintenance
- **Independent operation**: Maintenance runs separately from backup jobs
Now, deploy the ``kopia-config``, ``ReplicationSource``, and ``KopiaMaintenance`` configurations.
.. code-block:: console
$ kubectl create -f examples/kopia/source-kopia/source-kopia.yaml -n source
$ kubectl create -f examples/kopia/volsync_v1alpha1_replicationsource.yaml -n source
$ kubectl apply -f database-maintenance.yaml -n source
To verify the replication has completed, view the ReplicationSource
``.status`` field.
.. code-block:: console
$ kubectl -n source get ReplicationSource/database-source -oyaml
apiVersion: volsync.backube/v1alpha1
kind: ReplicationSource
metadata:
name: database-source
namespace: source
spec:
# ... lines omitted ...
status:
conditions:
- lastTransitionTime: "2024-01-15T18:16:35Z"
message: Reconcile complete
reason: ReconcileComplete
status: "True"
type: Reconciled
lastSyncDuration: 2m45.123456789s
lastSyncTime: "2024-01-15T18:19:45Z"
nextSyncTime: "2024-01-15T18:30:00Z"
kopia:
lastMaintenance: "2024-01-15T12:00:00Z"
In the above output, the ``lastSyncTime`` shows the time when the last backup
completed, and ``lastMaintenance`` shows when maintenance was last run. The
maintenance operation enforces retention policies, removing old snapshots
according to the defined retention rules.
-----------------------------------------
The backup created by VolSync can be seen by directly accessing the Kopia
repository:
.. code-block:: console
# In one window, create a port forward to access the minio server
$ kubectl port-forward --namespace minio svc/minio 9000:9000
# In another, access the repository with kopia via the above forward
$ export AWS_ACCESS_KEY_ID=access
$ export AWS_SECRET_ACCESS_KEY=password
$ export KOPIA_PASSWORD=my-secure-kopia-password
$ kopia repository connect s3 --bucket=kopia-repo --endpoint=http://127.0.0.1:9000
$ kopia snapshot list
Snapshots:
2024-01-15 18:19:45 UTC k8s-volsync@cluster 01234567890abcdef Path: /data Size: 1.2 GB
There is a snapshot in the kopia repository created by the kopia data mover.
Advanced Policy Configuration (Future Enhancement)
===================================================
.. warning::
External policy file configuration requires mounting policy files via ConfigMap or Secret.
The following example shows the planned functionality. Currently, use inline
configuration options (retain, compression, actions) in the ReplicationSource spec.
For future complex policy requirements (not yet available):
.. code-block:: yaml
---
apiVersion: v1
kind: ConfigMap
metadata:
name: database-kopia-policies
namespace: source
data:
global-policy.json: |
{
"retention": {
"keepLatest": 10,
"keepHourly": 48,
"keepDaily": 30,
"keepWeekly": 8,
"keepMonthly": 24,
"keepAnnual": 5
},
"compression": {
"compressor": "zstd",
"minSize": 1024,
"maxSize": 20971520
},
"actions": {
"beforeSnapshotRoot": [
{
"mode": "essential",
"script": "/scripts/pre-backup.sh",
"timeout": 300
}
],
"afterSnapshotRoot": [
{
"mode": "async",
"script": "/scripts/post-backup.sh"
}
]
},
"scheduling": {
"intervalSeconds": 3600,
"timesOfDay": ["02:00", "14:00", "22:00"]
},
"errorHandling": {
"ignoreFileErrors": true,
"ignoreDirectoryErrors": false
},
"files": {
"ignore": [
"*.tmp",
"*.swp",
"lost+found/",
".Trash*/"
],
"dotFiles": "include",
"oneFileSystem": true
}
}
repository.config: |
{
"enableActions": true,
"permittedActions": [
"beforeSnapshotRoot",
"afterSnapshotRoot"
]
}
---
apiVersion: volsync.backube/v1alpha1
kind: ReplicationSource
metadata:
name: database-source-advanced
namespace: source
spec:
sourcePVC: mysql-pv-claim
trigger:
schedule: "0 */2 * * *" # Every 2 hours
kopia:
repository: kopia-config
policyConfig:
configMapName: database-kopia-policies
globalPolicyFilename: "global-policy.json"
repositoryConfigFilename: "repository.config"
copyMethod: Clone
**External Policy Benefits:**
- **Fine-grained Control**: Access to all Kopia policy settings
- **Complex Scheduling**: Define multiple backup times per day
- **Advanced Filtering**: Exclude specific file patterns from backups
- **Error Handling**: Configure how to handle backup errors
- **Action Modes**: Control action execution (essential, async, optional)
- **Size-based Compression**: Only compress files within specific size ranges
.. note::
**Current Status**: External policy files via ConfigMap/Secret are not yet implemented.
Use inline configuration options in the ReplicationSource spec for retention policies,
compression settings (at repository creation), and snapshot actions.
Restoring the backup
====================
To restore from the backup, create a destination, deploy ``kopia-config`` and
``ReplicationDestination`` on the destination.
.. code-block:: console
$ kubectl create ns dest
$ kubectl annotate namespace dest volsync.backube/privileged-movers="true"
$ kubectl -n dest create -f examples/kopia/source-kopia/
To start the restore, create an empty PVC for the data:
.. code-block:: console
$ kubectl -n dest create -f examples/source-database/mysql-pvc.yaml
persistentvolumeclaim/mysql-pv-claim created
Create the ReplicationDestination in the ``dest`` namespace to restore the data:
.. code-block:: yaml
---
apiVersion: volsync.backube/v1alpha1
kind: ReplicationDestination
metadata:
name: database-destination
namespace: dest
spec:
trigger:
manual: restore
kopia:
destinationPVC: mysql-pv-claim
repository: kopia-config
copyMethod: Direct
# ⚠️ sourceIdentity REQUIRED because this is a cross-namespace restore
# (dest namespace ≠ source namespace)
# For same-namespace restores with matching names, sourceIdentity is optional
sourceIdentity:
sourceName: database-source # Source ReplicationSource name
sourceNamespace: source # Source namespace (different from dest)
# sourcePVCName is auto-discovered from the ReplicationSource
.. code-block:: console
$ kubectl -n dest create -f examples/kopia/volsync_v1alpha1_replicationdestination.yaml
Once the restore is complete, the ``.status.lastManualSync`` field will match
``.spec.trigger.manual``.
To verify restore, deploy the MySQL database to the ``dest`` namespace which will use the data that has
been restored from sourcePVC backup.
Create the Deployment, Service, and Secret.
.. code-block:: console
$ kubectl create -n dest -f examples/destination-database/mysql-secret.yaml
$ kubectl create -n dest -f examples/destination-database/mysql-deployment.yaml
$ kubectl create -n dest -f examples/destination-database/mysql-service.yaml
Validate that the mysql pod is running within the environment.
.. code-block:: console
$ kubectl get pods -n dest
NAME READY STATUS RESTARTS AGE
mysql-8b9c5c8d8-v6tg6 1/1 Running 0 38m
Connect to the mysql pod and list the databases to verify the synced database
exists.
.. code-block:: console
$ kubectl exec --stdin --tty -n dest $(kubectl get pods -n dest | grep mysql | awk '{print $1}') -- /bin/bash
$ mysql -u root -p$MYSQL_ROOT_PASSWORD
> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| synced |
| sys |
+--------------------+
5 rows in set (0.00 sec)
> exit
$ exit
.. note::
If the ``beforeSnapshot`` action created a SQL dump file, you may also find
``mysql-backup.sql`` in the restored data. This dump can be used as an
additional recovery option or imported into a fresh database instance.
.. _Repository Configuration Best Practices:
==================================================
Repository Configuration Best Practices
==================================================
Single Repository Approach (Recommended)
=========================================
**For optimal deduplication benefits, it is strongly recommended to use a single Kopia
repository for all your PVCs.** This means using a single S3 bucket (or other backend)
without path prefixes for all your backups. This approach maximizes Kopia's deduplication
capabilities across all your data.
Why Use a Single Repository?
-----------------------------
1. **Maximum Deduplication**: Kopia performs content-defined chunking and deduplication
at the repository level. When all PVCs share the same repository, duplicate data blocks
across different PVCs are stored only once, significantly reducing storage costs.
2. **Simplified Management**: Managing one repository is simpler than managing multiple
repositories with different paths or buckets.
3. **Better Storage Efficiency**: Common data patterns (like operating system files,
application binaries, or shared libraries) are deduplicated across all your backups.
4. **Automatic Isolation**: Kopia internally manages separation between different PVCs
using the username/hostname combination. Each ReplicationSource automatically gets
a unique identity, ensuring complete isolation of snapshot histories.
How Kopia Manages Multiple PVCs in One Repository
--------------------------------------------------
Kopia uses a combination of username and hostname to create unique identities for each
backup source. VolSync automatically generates these identities based on:
- **Username**: Derived from the ReplicationSource name and namespace
- **Hostname**: Defaults to the namespace name
This means each PVC backup has its own isolated snapshot history within the shared
repository, while still benefiting from cross-PVC deduplication.
Recommended Configuration for Multiple PVCs
============================================
When backing up multiple PVCs to the same repository, use the **same** repository
configuration (same S3 bucket, no path prefixes) but with different secret names:
For ``pvc-a``:
.. code-block:: yaml
---
# Shared Kopia repository configuration (RECOMMENDED APPROACH)
# Use the SAME repository URL for all PVCs - no path prefixes!
apiVersion: v1
kind: Secret
metadata:
name: kopia-config-shared
namespace: source
type: Opaque
stringData:
# Single repository URL - no path prefix for optimal deduplication
KOPIA_REPOSITORY: s3://kopia-repo
# Single repository encryption password for all PVCs
KOPIA_PASSWORD: my-secure-kopia-password
# S3 credentials
AWS_ACCESS_KEY_ID: access
AWS_SECRET_ACCESS_KEY: password
# S3 endpoint (required for non-AWS S3)
AWS_S3_ENDPOINT: http://minio.minio.svc.cluster.local:9000
---
# ReplicationSource for pvc-a
apiVersion: volsync.backube/v1alpha1
kind: ReplicationSource
metadata:
name: app-database # Unique name creates unique identity
namespace: source
spec:
sourcePVC: pvc-a
trigger:
schedule: "*/30 * * * *"
kopia:
repository: kopia-config-shared # Use shared repository
retain:
daily: 7
weekly: 4
monthly: 6
yearly: 1
compression: zstd
parallelism: 2
copyMethod: Clone
# Identity automatically generated as:
# username: app-database-source
# hostname: source
# Full identity: app-database-source@source
For ``pvc-b``:
.. code-block:: yaml
---
# ReplicationSource for pvc-b (using the SAME repository)
apiVersion: volsync.backube/v1alpha1
kind: ReplicationSource
metadata:
name: app-uploads # Different name ensures unique identity
namespace: source
spec:
sourcePVC: pvc-b
trigger:
schedule: "*/30 * * * *"
kopia:
repository: kopia-config-shared # SAME shared repository
retain:
daily: 7
weekly: 4
monthly: 6
yearly: 1
compression: zstd
parallelism: 2
copyMethod: Clone
# Identity automatically generated as:
# username: app-uploads-source
# hostname: source
# Full identity: app-uploads-source@source
.. note::
**Key Benefits of Single Repository**:
- Kopia safely supports multiple clients writing to the same repository simultaneously
- Each ReplicationSource maintains its own isolated snapshot history
- Deduplication works across ALL PVCs in the repository
- Storage savings can be significant when backing up similar data
When to Use Separate Repositories
==================================
While a single repository is recommended for most use cases, there are specific scenarios
where separate repositories (different buckets or path prefixes) might be appropriate:
1. **Compliance Requirements**: Different data classifications requiring physical separation
- HIPAA-regulated healthcare data vs. general application data
- PCI-DSS payment card data vs. non-sensitive data
- GDPR-protected personal data with different retention requirements
2. **Organizational Boundaries**: Clear separation between departments or teams
- Different departments with separate budgets and storage accounts
- Multi-tenant SaaS environments with strict isolation requirements
- Separate development, staging, and production environments
3. **Different Retention Policies**: Incompatible backup retention requirements
- Long-term archival data (years) vs. short-term operational backups (days)
- Legal hold requirements for specific datasets
4. **Performance Isolation**: Preventing one workload from impacting another
- High-frequency backup jobs vs. occasional large backups
- Critical production systems vs. non-critical development work
5. **Geographic Requirements**: Data residency and latency considerations
- Data that must remain in specific regions for compliance
- Optimizing for regional performance by using local storage
Example: Using Separate Repositories When Necessary
----------------------------------------------------
If you must use separate repositories (e.g., for compliance), use distinct bucket paths:
.. code-block:: yaml
---
# Repository for HIPAA-compliant healthcare data
apiVersion: v1
kind: Secret
metadata:
name: kopia-config-healthcare
type: Opaque
stringData:
KOPIA_REPOSITORY: s3://backups-hipaa/healthcare-data
KOPIA_PASSWORD: healthcare-encryption-key
# ... other credentials
---
# Repository for general application data
apiVersion: v1
kind: Secret
metadata:
name: kopia-config-general
type: Opaque
stringData:
KOPIA_REPOSITORY: s3://backups-general/app-data
KOPIA_PASSWORD: general-encryption-key
# ... other credentials
.. warning::
Using separate repositories means you lose deduplication benefits between them.
Only separate repositories when you have a clear requirement to do so.
Understanding Deduplication Benefits
=====================================
To illustrate why a single repository is recommended, consider this example:
**Scenario**: Backing up 10 application PVCs, each containing:
- 500 MB of operating system libraries
- 200 MB of common application frameworks
- 300 MB of unique application data
**With Separate Repositories** (bucket prefixes per PVC):
- Total storage used: 10 × (500 + 200 + 300) = 10,000 MB
- No deduplication between PVCs
**With Single Repository** (recommended approach):
- Common OS libraries stored once: 500 MB
- Common frameworks stored once: 200 MB
- Unique data for all apps: 10 × 300 = 3,000 MB
- Total storage used: 500 + 200 + 3,000 = 3,700 MB
- **Storage savings: 63%**
The savings increase dramatically when:
- You have many PVCs with similar base images
- Applications share common libraries or frameworks
- You're backing up multiple instances of the same application
- Development, staging, and production environments have similar data
.. tip::
Monitor your Kopia repository statistics to see actual deduplication ratios.
It's common to see 50-80% storage reduction in environments with similar workloads.
Kopia Advantages for Database Backups
======================================
Kopia provides several advantages for database backups:
**Consistency Actions**: The ``beforeSnapshot`` and ``afterSnapshot`` actions ensure
database consistency without requiring application downtime.
**Efficient Compression**: Kopia's zstd compression typically achieves better compression
ratios than traditional backup tools, reducing storage costs.
**Incremental Backups**: Kopia's content-defined chunking provides efficient incremental
backups that only transfer changed data blocks.
**Concurrent Access**: Multiple backup sources can safely write to the same repository,
making it easier to manage centralized backup infrastructure.
**Fast Restores**: Kopia's architecture enables fast partial and full restores without
needing to download entire backup archives.