Skip to content

rsync tutorial

1. Introduction

rsync is a commonly used Linux application for file synchronization.

It can synchronize files between a local computer and a remote computer, or between two local directories (but does not support synchronization between two remote computers). It can also be used as a file copy tool, replacing the ‘cp’ and ‘mv’ commands.

The ‘r’ in its name refers to remote, and rsync actually means “remote sync”. Unlike other file transfer tools (such as FTP or scp), the biggest feature of rsync is that it will check the existing files of the sender and receiver, and only transfer the changed parts (the default rule is that the file size or modification time has changed).

2. Installation

If rsync is not installed on the local or remote computer, you can use the following command to install it.

$ sudo apt-get install rsync

3. basic usage

3.1 -r

When using the rsync command natively, it can be used as an alternative to the ‘cp’ and ‘mv’ commands to synchronize the source directory to the target directory.

$ rsync -r source destination

In the above command, ‘-r’ means recursive, which includes subdirectories. Note that ‘-r’ is required, otherwise rsync will not run successfully. The ‘source’ directory represents the source directory, and the ‘destination’ represents the target directory.

If there are multiple files or directories that need to be synchronized, it can be written as follows.

$ rsync -r source1 source2 destination

In the above command, ‘source1’ and ‘source2’ will be synchronized to the ‘destination’ directory.

3.2 -a

The -a parameter can replace -r, in addition to recursive synchronization, it can also synchronize meta information (such as modification time, permissions, etc.). -a is more useful than -r since rsync by default uses file size and modification time to decide whether a file needs to be updated. The following usage is the common way of writing.

$ rsync -a source destination

If the destination directory destination does not exist, rsync will create it automatically. After executing the above command, the source directory source is completely copied to the target directory destination, which forms the directory structure of destination/source.

If you only want to synchronize the content in the source directory source to the target directory destination, you need to add a slash after the source directory.

$ rsync -a source/destination

After the above command is executed, the content in the source directory will be copied to the destination directory, and a source subdirectory will not be created under the destination.

3.3 -n

If you are not sure what the result will be after rsync is executed, you can use the -n or --dry-run parameter to simulate the execution result first.

$ rsync -anv source/destination

In the above command, the -n parameter simulates the result of command execution, and does not actually execute the command. The -v parameter is to output the result to the terminal, so that you can see what content will be synchronized.

3.4 --delete

By default, rsync simply ensures that all contents of the source directory (except files explicitly excluded) are copied to the destination directory. It doesn’t make two directories the same, and it doesn’t delete files. If you want to make the target directory a mirror copy of the source directory, you must use the --delete parameter, which will delete files that only exist in the target directory and do not exist in the source directory.

$ rsync -av --delete source/destination

In the above command, the --delete parameter will make destination a mirror image of source.

4. Exclude files

4.1 --exclude

Sometimes, we want to exclude certain files or directories during synchronization, then we can use --exclude parameter to specify the exclusion mode.

$ rsync -av --exclude='*.txt' source/ destination
# or
$ rsync -av --exclude '*.txt' source/ destination

The above command excludes all TXT files.

Note that rsync will synchronize hidden files starting with “dot”. If you want to exclude hidden files, you can write --exclude=".*".

If you want to exclude all files in a certain directory, but do not want to exclude the directory itself, you can write it as follows.

$ rsync -av --exclude 'dir1/*' source/ destination

For multiple exclusion patterns, multiple --exclude parameters can be used.

$ rsync -av --exclude 'file1.txt' --exclude 'dir1/*' source/ destination

Multiple exclude patterns can also take advantage of Bash’s wide-expansion feature, with just one --exclude argument.

$ rsync -av --exclude={'file1.txt','dir1/*'} source/ destination

If there are many exclusion patterns, you can write them to a file, one line per pattern, and then use the --exclude-from parameter to specify this file.

$ rsync -av --exclude-from='exclude-file.txt' source/destination

4.2 --include

The --include parameter is used to specify the file mode that must be synchronized, and is often used in combination with --exclude.

$ rsync -av --include="*.txt" --exclude='*' source/destination

The above command specifies that when synchronizing, all files are excluded, but TXT files are included.

5. Remote synchronization

5.1 SSH protocol

In addition to supporting synchronization between two local directories, rsync also supports remote synchronization. It can synchronize local content to a remote server.

$ rsync -av source/ username@remote_host:destination

It is also possible to synchronize remote content to the local.

$ rsync -av username@remote_host:source/destination

rsync uses SSH by default for remote login and data transfer.

Since rsync did not use the SSH protocol in the early days, it was necessary to use the -e parameter to specify the protocol, which was changed later. Therefore, -e ssh below can be omitted.

$ rsync -av -e ssh source/ user@remote_host:/destination

However, if the ssh command has additional parameters, the -e parameter must be used to specify the SSH command to be executed.

$ rsync -av -e 'ssh -p 2234' source/ user@remote_host:/destination

In the above command, the -e parameter specifies that SSH uses port 2234.

5.2 rsync protocol

In addition to using SSH, if another server has the rsync daemon installed and running, it can also be transferred with the rsync:// protocol (port 873 by default). The specific writing method is to use double colons to separate :: between the server and the target directory.

$ rsync -av source/192.168.122.32::module/destination

Note that module in the above address is not the actual path name, but a resource name specified by the rsync daemon, assigned by the administrator.

If you want to know the list of all modules allocated by the rsync daemon, you can execute the following command.

$ rsync rsync://192.168.122.32

In addition to using double colons, the rsync protocol can also directly use the rsync:// protocol to specify the address.

$ rsync -av source/rsync://192.168.122.32/module/destination

6. Incremental backup

The biggest feature of rsync is that it can complete incremental backup, that is, only the changed files are copied by default.

In addition to the direct comparison between the source directory and the target directory, rsync also supports the use of the base directory, which is to synchronize the changed parts between the source directory and the base directory to the target directory.

The specific method is that the first synchronization is a full backup, and all files are synchronized in the base directory. Every subsequent synchronization is an incremental backup, only synchronizing the part that has changed between the source directory and the base directory, and saving this part in a new target directory. This new target directory also contains all files, but in fact, only those files that have changed exist in this directory, and other files that have not changed are hard links to the files in the base directory.

The --link-dest parameter is used to specify the base directory when synchronizing.

$ rsync -a --delete --link-dest /compare/path /source/path /target/path

In the above command, the --link-dest parameter specifies the reference directory /compare/path, and then the source directory /source/path is compared with the reference directory to find out the changed files and copy them to the target directory/target/path. Those files that have not changed will generate hard links. The first backup of this command is a full backup, followed by incremental backups.

Below is an example script that backs up a user’s home directory.

#!/bin/bash

# A script to perform incremental backups using rsync

set -o errexit
set -o nounset
set -o pipefail

readonly SOURCE_DIR="${HOME}"
readonly BACKUP_DIR="/mnt/data/backups"
readonly DATETIME="$(date '+%Y-%m-%d_%H:%M:%S')"
readonly BACKUP_PATH="${BACKUP_DIR}/${DATETIME}"
readonly LATEST_LINK="${BACKUP_DIR}/latest"

mkdir -p "${BACKUP_DIR}"

rsync -av --delete \
"${SOURCE_DIR}/" \
--link-dest "${LATEST_LINK}" \
--exclude=".cache" \
"${BACKUP_PATH}"

rm -rf "${LATEST_LINK}"
ln -s "${BACKUP_PATH}" "${LATEST_LINK}"

In the above script, each synchronization will generate a new directory ${BACKUP_DIR}/${DATETIME}, and point the soft link ${BACKUP_DIR}/latest to this directory. For the next backup, ${BACKUP_DIR}/latest will be used as the base directory to generate a new backup directory. Finally, point the soft link ${BACKUP_DIR}/latest to the new backup directory.

7. Options

-a, --archive parameters indicate the archive mode, save all metadata, such as modification time (modification time), permissions, owners, etc., and soft links will also be synchronized.

--append parameter specifies that the file continues to transfer where it left off last time.

--append-verify parameter is similar to the --append parameter, but it will perform a verification on the file after the transfer is completed. If verification fails, the entire file will be resent.

-b, --backup parameters specify that when deleting or updating an existing file in the target directory, the file will be renamed and then backed up. The default behavior is to delete. The renaming rule is to add the file suffix specified by the --suffix parameter, the default is ~.

--backup-dir parameter specifies the directory where files are stored during backup, such as --backup-dir=/path/to/backups.

--bwlimit parameter specifies the bandwidth limit, the default unit is KB/s, such as --bwlimit=100.

-c, --checksum parameters change the checking method of rsync. By default, rsync only checks whether the file size and last modification date have changed, and if so, retransmits; after using this parameter, it decides whether to retransmit by judging the checksum of the file content.

--delete parameter deletes files that only exist in the target directory and do not exist in the source target, that is, ensure that the target directory is a mirror image of the source target.

-e parameter specifies to use the SSH protocol to transfer data.

--exclude parameter specifies to exclude files that are not synchronized, such as --exclude="*.iso".

--exclude-from parameter specifies a local file, which contains file patterns that need to be excluded, one line per pattern.

--existing, --ignore-non-existing parameters indicate that files and directories that do not exist in the target directory are not synchronized.

-h parameter indicates output in human readable format.

-h, --help parameters return help information.

-i parameter indicates the details of the file differences between the output source directory and the target directory.

--ignore-existing parameter indicates that as long as the file already exists in the target directory, skip it and no longer synchronize these files.

--include parameter specifies the files to be included during synchronization, and is generally used in conjunction with --exclude.

--link-dest parameter specifies the base directory for incremental backups.

-m parameter specifies not to sync empty directories.

--max-size parameter sets the size limit of the largest file transferred, for example no more than 200KB (--max-size='200k').

--min-size parameter sets the size limit of the smallest file transferred, for example not less than 10KB (--min-size=10k).

-n parameter or the --dry-run parameter simulates the operation that will be performed, but does not actually execute it. Use it with the -v parameter to see what content will be synchronized.

-P parameter is a combination of --progress and --partial.

--partial parameter allows to resume interrupted transfers. When this parameter is not used, rsync will delete the half-interrupted file; after using this parameter, the half-transferred file will also be synchronized to the target directory, and the interrupted transmission will be resumed at the next synchronization. Generally need to be used in conjunction with --append or --append-verify.

--partial-dir parameter specifies to save half of the files transferred to a temporary directory, such as --partial-dir=.rsync-partial. Generally need to be used in conjunction with --append or --append-verify.

--progress parameter indicates to display progress.

-r parameter means recursive, that is, including subdirectories.

--remove-source-files parameter indicates that after the transmission is successful, delete the files of the sender.

--size-only parameter indicates that only the files whose size has changed are synchronized, regardless of the difference in file modification time.

--suffix parameter specifies the suffix added to the file name when the file name is backed up. The default is ~.

-u, --update parameters indicate that files with updated modification times in the target directory are skipped during synchronization, that is, these files with updated timestamps are not synchronized.

-v parameter indicates the output details. -vv means to output more detailed information, -vvv means to output the most detailed information.

--version parameter returns the version of rsync.

-z parameter specifies to compress the data when synchronizing.

8. Reference link

Leave a Reply