========================
Rclone-based replication
========================
.. toctree::
:hidden:
database_example
rclone-secret
.. sidebar:: Contents
.. contents:: Rclone-based replication
:local:
Rclone-based replication supports 1:many asynchronous replication of volumes for use
cases such as:
- High fan-out data replication from a central site to many (edge) sites
With this method, VolSync synchronizes data from a ReplicationSource to a ReplicationDestination
using `Rclone `_ via an intermediary object storage location like AWS S3.
----------------------------------
The Rclone method uses a "push" and "pull" model for the data replication. This requires a schedule or other
trigger on both the source and destination sides to trigger the replication iterations.
During each synchronization iteration:
- A point-in-time (PiT) copy of the source volume is created using CSI drivers. This copy will be used as the source data.
- The copy is attached to an Rclone data mover job pod which uses the contents of the ``rclone-secret`` to connect to the intermediary object storage target (e.g., AWS S3).
- The source pod uses ``rclone sync`` to copy the data to S3.
- On the destination side, a corresponding Rclone mover pod syncs the data from the intermediate object storage into a volume on the destination.
- At the conclusion of the transfer, the destination creates a snapshot copy to preserve a point-in-time copy of the incoming source data.
VolSync is configured via two CustomResources (CRs), one on the source side and
one on the destination side of the replication relationship. While there should
only be one ReplicationSource pushing data to the intermediate storage, there
may be an arbitrary number of ReplicationDestination instances syncing data from
the intermediate storage to destination clusters. This enables the model of high
fan-out data distribution.
Source configuration
=========================
An example source configuration is shown below:
.. code:: yaml
---
apiVersion: volsync.backube/v1alpha1
kind: ReplicationSource
metadata:
name: database-source
namespace: source
spec:
# The PVC to sync
sourcePVC: mysql-pv-claim
trigger:
# Synchronize every 6 minutes
schedule: "*/6 * * * *"
rclone:
# The configuration section of the rclone config file to use
rcloneConfigSection: "aws-s3-bucket"
# The path to the object bucket
rcloneDestPath: "volsync-test-bucket/mysql-pv-claim"
# Secret holding the rclone configuration
rcloneConfig: "rclone-secret"
# Method used to generate the PiT copy
copyMethod: Snapshot
# The StorageClass to use when creating the PiT copy (same as source PVC if omitted)
storageClassName: my-sc-name
# The VSC to use if the copy method is Snapshot (default if omitted)
volumeSnapshotClassName: my-vsc-name
Since the ``copyMethod`` specified above is ``Snapshot``, the Rclone data mover creates a ``VolumeSnapshot``
of the source pvc ``mysql-pv-claim``. Then it converts this snapshot back into a PVC.
If ``copyMethod: Clone`` were used, the temporary, point-in-time copy would be
created by cloning the source PVC to a new PVC directly. This is more efficient,
but it is not supported by all CSI drivers.
The synchronization schedule, ``.spec.trigger.schedule``, is defined by a
`cronspec `_, making the schedule
very flexible. Both intervals (shown above) as well as specific times and/or
days can be specified.
Source status
-----------------
Once the ``ReplicationSource`` is deployed, VolSync updates the ``nextSyncTime`` in the ``ReplicationSource`` object.
.. code:: yaml
---
apiVersion: volsync.backube/v1alpha1
kind: ReplicationSource
# ... omitted ...
spec:
rclone:
copyMethod: Snapshot
rcloneConfig: rclone-secret
rcloneConfigSection: aws-s3-bucket
rcloneDestPath: volsync-test-bucket/mysql-pv-claim
storageClassName: my-sc-name
volumeSnapshotClassName: my-vsc-name
sourcePVC: mysql-pv-claim
trigger:
schedule: "*/6 * * * *"
status:
conditions:
lastTransitionTime: 2021-01-18T21:50:59Z
message: Reconcile complete
reason: ReconcileComplete
status: True
type: Reconciled
nextSyncTime: 2021-01-18T22:00:00Z
Additional source options
-------------------------
There are a number of more advanced configuration parameters that are supported
for configuring the source. All of the following options would be placed within
the ``.spec.rclone`` portion of the ReplicationSource CustomResource.
.. include:: ../inc_src_opts.rst
rcloneConfigSection
This is used to identify the configuration section within
``rclone.conf`` to use.
rcloneDestPath
This is the remote storage location in which the persistent data will
be uploaded.
Normally the root of this path is the storage bucket name. Any sub paths
would be created as folders in the storage bucket.
In the example above, using ``volsync-test-bucket/mysql-pv-claim`` means that
the source pvc will be replicated to the folder called ``mysql-pv-claim`` in
the bucket called ``volsync-test-bucket``.
If a unique bucket is used for each PVC to be replicated, then a path with
simply the bucket name (such as ``volsync-test-bucket``) is sufficient.
However if the same bucket will be used for multiple different PVCs (and
therefore multiple ReplicationSources), a unique path should be used for each
PVC/ReplicationSource.
rcloneConfig
This specifies the name of a secret to be used to retrieve the Rclone
configuration. The :doc:`content of the Secret<./rclone-secret>` is an
``rclone.conf`` file.
customCA
This option allows a custom certificate authority to be used when making TLS
(https) connections to the remote repository.
----------------------------------
Destination configuration
=========================
An example destination configuration is shown here:
.. code:: yaml
---
apiVersion: volsync.backube/v1alpha1
kind: ReplicationDestination
metadata:
name: database-destination
namespace: dest
spec:
trigger:
# Every 6 minutes, offset by 3 minutes
schedule: "3,9,15,21,27,33,39,45,51,57 * * * *"
rclone:
rcloneConfigSection: "aws-s3-bucket"
rcloneDestPath: "volsync-test-bucket/mysql-pvc-claim"
rcloneConfig: "rclone-secret"
copyMethod: Snapshot
accessModes: [ReadWriteOnce]
capacity: 10Gi
storageClassName: my-sc
volumeSnapshotClassName: my-vsc
Similar to the replication source, a synchronization schedule is defined
``.spec.trigger.schedule``. This indicates when persistent data should be pulled
from the remote storage location. It is important that the schedule for the
destinations are offset from that of the source to allow the source to finish
pushing updates for an iteration prior to the the destination attempting to pull
them.
In the above example, a 10 GiB RWO volume will be provisioned using the ``my-sc`` StorageClass to serve as the destination for replicated data. This volume is
used by the Rclone data mover to receive the incoming data transfers.
Since the ``copyMethod`` specified above is ``Snapshot``, a ``VolumeSnapshot`` of the incoming data
will be created at the end of each synchronization interval. It is this snapshot that
would be used to gain access to the replicated data. The name of the current ``VolumeSnapshot``
holding the latest synced data will be placed in ``.status.latestImage``.
Destination status
------------------
VolSync provides status information on the state of the replication via the
``.status`` field in the ReplicationDestination object:
.. code:: yaml
---
API Version: volsync.backube/v1alpha1
Kind: ReplicationDestination
# ... omitted ...
Spec:
Rclone:
Access Modes:
ReadWriteOnce
Capacity: 10Gi
Copy Method: Snapshot
Rclone Config: rclone-secret
Rclone Config Section: aws-s3-bucket
Rclone Dest Path: volsync-test-bucket
Storage Class Name: my-sc
Volume Snapshot Class Name: my-vsc
Status:
Conditions:
Last Transition Time: 2021-01-19T22:16:02Z
Message: Reconcile complete
Reason: ReconcileComplete
Status: True
Type: Reconciled
Last Sync Duration: 7.066022293s
Last Sync Time: 2021-01-19T22:16:02Z
Latest Image:
API Group: snapshot.storage.k8s.io
Kind: VolumeSnapshot
Name: volsync-dest-database-destination-20210119221601
In the above example,
- ``Rclone Dest Path`` indicates the intermediary storage system from where data will be
transferred to the destination site. In the above example, the intermediary storage system is an S3 bucket.
- No errors were detected (the Reconciled condition is True).
After at least one synchronization has taken place, the following will also be
available:
- ``Last Sync Time`` contains the time of the last successful data synchronization.
- ``Latest Image`` references the object with the most recent copy of the data. If
the copyMethod is ``Snapshot``, this will be a VolumeSnapshot object. If the
copyMethod is ``Direct``, this will be the PVC that is used as the destination by
VolSync.
Additional destination options
------------------------------
There are a number of more advanced configuration parameters that are supported
for configuring the destination. All of the following options would be placed
within the ``.spec.rclone`` portion of the ReplicationDestination CustomResource.
.. include:: ../inc_dst_opts.rst
rcloneConfigSection
This is used to identify the configuration section within
``rclone.conf`` to use.
rcloneDestPath
This is the remote storage location in which the persistent data will
be downloaded. This should match the rcloneDestPath used on the ReplicationSource.
rcloneConfig
This specifies the secret to be used. The secret contains an ``rclone.conf``
file with the configuration and credentials for the object target.
customCA
This option allows a custom certificate authority to be used when making TLS
(https) connections to the remote repository.
For a concrete example, see the :doc:`database synchronization example `.
Using a custom certificate authority
====================================
Normally, Rclone will use a default set of certificates to verify the validity
of remote repositories when making https connections. However, users that deploy
with a self-signed certificate will need to provide their CA's certificate via
the ``customCA`` option.
The custom CA certificate needs to be provided in a Secret or ConfigMap to
VolSync. For example, if the CA certificate is a file in the current directory
named ``ca.crt``, it can be loaded as a Secret or a ConfigMap.
Example using a customCA loaded as a secret:
.. code-block:: console
$ kubectl create secret generic tls-secret --from-file=ca.crt=./ca.crt
secret/tls-secret created
$ kubectl describe secret/tls-secret
Name: tls-secret
Namespace: default
Labels:
Annotations:
Type: Opaque
Data
====
ca.crt: 1127 bytes
This Secret would then be used in the ReplicationSource and/or
ReplicationDestination objects:
.. code-block:: yaml
---
apiVersion: volsync.backube/v1alpha1
kind: ReplicationSource
metadata:
name: mydata-backup-with-customca
spec:
# ... fields omitted ...
rclone:
# ... other fields omitted ...
customCA:
secretName: tls-secret
key: ca.crt
To use a customCA in a ConfigMap, specify ``configMapName`` in the spec instead
of ``secretName``, for example:
.. code-block:: yaml
---
apiVersion: volsync.backube/v1alpha1
kind: ReplicationSource
metadata:
name: mydata-backup-with-customca
spec:
# ... fields omitted ...
rclone:
# ... other fields omitted ...
customCA:
configMapName: tls-configmap-name
key: ca.crt