rsync是一个可以增量备份的工具,有时候又称为文件或系统镜像工具,常常跟虚拟IP一起,可以搞一个假的HA高可用方案,比如gitlab server就常常用rsync搞成所谓的高可用,其实是冷备

Rsync is a wonderful little utility that's amazingly easy to set up on your machines. Rather than have a scripted FTP session, or some other form of file transfer script -- rsync copies only the diffs of files that have actually changed, compressed and through ssh if you want to for security. That's a mouthful -- but what it means is:

  • Diffs - Only actual changed pieces of files are transferred, rather than the whole file. This makes updates faster, especially over slower links like modems. FTP would transfer the entire file, even if only one byte changed.
  • Compression - The tiny pieces of diffs are then compressed on the fly, further saving you file transfer time and reducing the load on the network.
  • Secure Shell - The security concious of you out there would like this, and you should all be using it. The stream from rsync is passed through the ssh protocol to encrypt your session instead of rsh, which is also an option (and required if you don't use ssh - enable it in your /etc/inet.d and restart your inet daemon if you disabled it for security).

Rsync is rather versatile as a backup/mirroring tool, offering many features above and beyond the above. I personally use it to synchronize Website trees from staging to production servers and to backup key areas of the filesystems both automatically through cron and by a CGI script. Here are some other key features of rsync:

  • Support for copying links, devices, owners, groups and permissions
  • Exclude and exclude-from options similar to GNU tar
  • A CVS exclude mode for ignoring the same files that CVS would ignore
  • Does not require root privileges
  • Pipelining of file transfers to minimize latency costs
  • Support for anonymous or authenticated rsync servers (ideal for mirroring)

# 原理

You must set up one machine or another of a pair to be an "rsync server" by running rsync in a daemon mode ("rsync --daemon" at the commandline) and setting up a short, easy configuration file (/etc/rsyncd.conf). Below I'll detail a sample configuration file. The options are readily understood, few in number -- yet quite powerful.

Any number of machines with rsync installed may then synchronize to and/or from the machine running the rsync daemon. You can use this to make backups, mirror filesystems, distribute files or any number of similar operations. Through the use of the "rsync algorithm" which transfers only the diffs between files (similar to a patch file) and then compressing them -- you are left with a very efficient system.

For those of you new to secure shell ("ssh" for short), you should be using it! There's a very useful and quite thourough Getting Started with SSH (opens new window) document available. You may also want to visit the Secure Shell Web Site (opens new window). Or, just hit the Master FTP Site in Finland and snag it for yourself. It provides a secure, encrypted "pipe" for your network traffic. You should be using it instead of telnet, rsh or rlogin and use the replacement "scp" command instead of "rcp."

You must set up a configuration file on the machine meant to be the server and run the rsync binary in daemon mode. Even your rsync client machines can run rsync in daemon mode for two-way transfers. You can do this automatically for each connection via the inet daemon or at the commandline in standalone mode to leave it running in the background for often repeated rsyncs. I personally use it in standalone mode, like Apache. I have a crontab entry that synchronizes a Web site directory hourly. Plus there is a CGI script that folks fire off frequently during the day for immediate updating of content. This is a lot of rsync calls! If you start off the rsync daemon through your inet daemon, then you incur much more overhead with each rsync call. You basically restart the rsync daemon for every connection your server machine gets! It's the same reasoning as starting Apache in standalone mode rather than through the inet daemon. It's quicker and more efficient to start rsync in standalone mode if you anticipate a lot of rsync traffic. Otherwise, for the occasional transfer follow the procedure to fire off rsync via the inet daemon. This way the rsync daemon, as small as it is, doesn't sit in memory if you only use it once a day or whatever. Your call.

Below is a sample rsync configuration file. It is placed in your /etc directory as rsyncd.conf.

motd file = /etc/rsyncd.motd
log file = /var/log/rsyncd.log
pid file = /var/run/
lock file = /var/run/rsync.lock

   path = /rsync_files_here
   comment = My Very Own Rsync Server
   uid = nobody
   gid = nobody
   read only = no
   list = yes
   auth users = username
   secrets file = /etc/rsyncd.scrt
Various options that you would modify right from the start are the areas in italics in the sample above. I'll start at the top, line by line, and go through what you should pay attention to. What the sample above does is setup a single "path" for rsync transfers to that machine. 

 Starting at the top are four lines specifying files and their paths for rsync running in daemon mode. The first is a "message of the day" (motd) file like you would use for an FTP server. This file's contents get displayed when clients connect to this machine. Use it as a welcome, warning or simply identification. The next line specifies a log file to send diagnostic and norml run-time messages to. The PID file contains the "process ID" (PID) number of the running rsync daemon. A lock file is used to ensure that things run smoothly. These options are global to the rsync daemon.

The next block of lines is specific to a "path" that rsync uses. The options contained therein have effect only within the block (they're local, not global options). Start with the "path" name. It's somewhat confusing that rsync uses the term "path" -- as it's not necessarily a full pathname. It serves as an "rsync area nickname" of sorts. It's a short, easy to remember (and type!) name that you assign to a try filesystem path with all the options you specify. Here are the things you need to set up first and foremost: 

    path - this is the actual filesystem path to where the files are rsync'ed from and/or to.
    comment - a short, descriptive explanation of what and where the path points to for listings.
    auth users - you really should put this in to restrict access to only a pre-defined user that you specify in the following secrets file - does not have to be a valid system user.
    secrets file - the file containing plaintext key/value pairs of usernames and passwords. 

One thing you should seriously consider is the "hosts allow" and "hosts deny" options for your path. Enter the IPs or hostnames that you wish to specifically allow or deny! If you don't do this, or at least use the "auth users" option, then basically that area of your filesystem is wide open to the world by anyone using rsync! Something I seriously think you should avoid... 

Check the rsyncd.conf man page with "man rsyncd.conf" and read it very carefully where security options are concerned. You don't want just anyone to come in and rsync up an empty directory with the "--delete" option, now do you?

The other options are all explained in the man page for rsyncd.conf. Basically, the above options specify that the files are chmod'ed to uid/gid, the filesystem path is read/write and that the rsync path shows up in rsync listings. The rsync secrets file I keep in /etc/ along with the configuration and motd files, and I prefix them with "rsyncd." to keep them together. 

Now on to actually using, or initiating an rsync transfer with rsync itself. It's the same binary as the daemon, just without the "--daemon" flag. It's simplicity is a virtue. I'll start with a commandline that I use in a script to synchronize a Web tree below.

rsync --verbose  --progress --stats --compress --rsh=/usr/local/bin/ssh \
      --recursive --times --perms --links --delete \
      --exclude "*bak" --exclude "*~" \
      /www/* webserver:simple_path_name
 Let's go through it one line at a time. The first line calls rsync itself and specifies the options "verbose," progress" and "stats" so that you can see what's going on this first time around. The "compress" and "rsh" options specify that you want your stream compressed and to send it through ssh (remember from above?) for security's sake.

The next line specifies how rsync itself operates on your files. You're telling rsync here to go through your source pathname recursively with "recursive" and to preserve the file timestamps and permissions with "times" and "perms." Copy symbolic links with "links" and delete things from the remote rsync server that are also deleted locally with "delete."

Now we have a line where there's quite a bit of power and flexibility. You can specify GNU tar-like include and exclude patterns here. In this example, I'm telling rsync to ignore some backup files that are common in this Web tree ("*.bak" and "*~" files). You can put whatever you want to match here, suited to your specific needs. You can leave this line out and rsync will copy all your files as they are locally to the remote machine. Depends on what you want.

Finally, the line that specifies the source pathname, the remote rsync machine and rsync "path." The first part "/www/*" specifies where on my local filesytem I want rsync to grab the files from for transmission to the remote rsync server. The next word, "webserver" should be the DNS name or IP address of your rsync server. It can be "w.x.y.z" or "" or even just "webserver" if you have a nickname defined in your /etc/hosts file, as I do here. The single colon specifies that you want the whole mess sent through your ssh tunnel, as opposed to the regular rsh tunnel. This is an important point to pay attention to! If you use two colons, then despite the specification of ssh on the commandline previously, you'll still go through rsh. Ooops. The last "www" in that line is the rsync "path" that you set up on the server as in the sample above.

Yes, that's it! If you run the above command on your local rsync client, then you will transfer the entire "/www/*" tree to the remote "webserver" machine except backup files, preserving file timestamps and permissions -- compressed and secure -- with visual feedback on what's happening.

Note that in the above example, I used GNU style long options so that you can see what the commandline is all about. You can also use abbreviations, single letters -- to do the same thing. Try running rsync with the "--help" option alone and you can see what syntax and options are available. 

# 实际用法

       rsync - a fast, versatile, remote (and local) file-copying tool

       Local:  rsync [OPTION...] SRC... [DEST]

       Access via remote shell:
         Pull: rsync [OPTION...] [USER@]HOST:SRC... [DEST]
         Push: rsync [OPTION...] SRC... [USER@]HOST:DEST

       Access via rsync daemon:
         Pull: rsync [OPTION...] [USER@]HOST::SRC... [DEST]
               rsync [OPTION...] rsync://[USER@]HOST[:PORT]/SRC... [DEST]
         Push: rsync [OPTION...] SRC... [USER@]HOST::DEST
               rsync [OPTION...] SRC... rsync://[USER@]HOST[:PORT]/DEST

       Usages with just one SRC arg and no DEST arg will list the source files instead of copying.


两个非常好的rpm包下载网站: (opens new window) (opens new window)服务端) <=客户端拉取)

如果是Access via remote shell,就是在src机器上执行rsync命令push或者dest机器上执行rsync命令pull,所以不存在服务端的概念;
如果是Access via rsync daemon就需要引入服务端守护进程的概念,然后client端就可以从服务端pull或者push到服务端;

下面是采用 Access via rsync daemon
--- 服务端
# yum install rsync
# rpm -qc rsync

[root@backup ~]# vim /etc/rsyncd.conf 

uid = nobody	#rsync
gid = nobody	#rsync
use chroot = yes
# max connections = 200
# timeout = 600
dont compress   = *.gz *.tgz *.zip *.z *.Z *.rpm *.deb *.bz2
pid file = /var/run/
#motd file=/var/rsync/welcome.msg
#lock file = /var/rsync/rsync.lock
log file = /var/rsync/rsyncd.log	#需要手动创建rsync目录!
secrets file = /etc/rsync.password

#port = 873 默认端口 如果服务端制定了,客户端也要指定--port=xxx
#fake super = yes
#ignore errors
#read only = false
#list = false

comment = path1
path = /path1
auth users = rsync_backup
read only = no
list = no
hosts allow =
hosts deny = *
comment = "/opt/gitlab"
path = /opt/gitlab
auth users = rsync_backup
read only = no
list = yes
hosts allow =
hosts deny = *
comment = "/var/opt/gitlab"
path = /var/opt/gitlab
auth users = rsync_backup
read only = no
list = yes
hosts allow =
hosts deny = *
comment = "/etc/gitlab"
path = /etc/gitlab
auth users = rsync_backup
read only = no
list = yes
hosts allow =
hosts deny = *
comment = "/var/log/gitlab"
path = /var/log/gitlab
auth users = rsync_backup
read only = no
list = yes
hosts allow =
hosts deny = *

#useradd -M -s /sbin/nologin rsync
mkdir /backup
chown -R rsync.rsync /backup/   
echo "rsync_backup:1" >/etc/rsync.password    密码设置为1
chmod 600 /etc/rsync.password

rsync --daemon --config=/etc/rsyncd.conf
for bash compatible back-tick expression:
kill -9 `cat /var/run/` 
cat /var/run/ | xargs kill -9
rm /var/run/

systemctl start rsyncd
systemctl enable rsyncd

--- 客户端
# yum install rsync

rsync -avz /anything [email protected]::test

rsync -avz [email protected]::gitlab_path /opt/gitlab

echo "1" >/etc/rsync.password  
chmod 600 /etc/rsync.password
rsync -avz [email protected]::gitlab_path /opt/gitlab --password-file=/etc/rsync.password
export RSYNC_PASSWORD=1     设置RSYNC_PASSWORD环境变量=1  这里的1是密码,密码要和服务端的一致

#vim /etc/rc.d/init.d/
vim /home/rsync.h:
pull sample: 
rsync -avz --delete rsync_backup@remote_server::backup /opt --password-file=/etc/rsync.password  >/dev/null 2>&1
push sample:
rsync -vrtL --delete --progress /opt/* rsync_backup@remote_server::backup --password-file=/etc/rsync.password 
-v参数表示显示输出结果,r表示保持属性,t表示保持时间,L表示软link视作普通文件, --delete有时候也很重要,参考troubleshooting的例子

chmod 755

echo "00 3 * * * root /home/" >> /etc/crontab #则每天凌晨3点运行同步。
crontab -e
		* * * * * /home/                 每分钟执行一次同步脚本;
        0 * * * * /home/                 每小时执行一次同步脚本;
        0 0 * * * /home/                 每天零点执行一次同步脚本; 
        0 9,18 * * * /home/            每天的9AM和6PM执行一次同步脚本; 


rsync /

rsync – avz  /var/tmp
rsync – avz  /var/tmp

rsync -avz webshell /

# troubleshooting

?rsync failed to connect to no route to host (113)


netstat -anp|grep "rsync"
firewall-cmd --permanent --add-port=873/tcp
firewall-cmd --reload

?rsync: opendir "." (in gitlab_path2) failed: Permission denied (13)

# rsync -avz --delete [email protected]::gitlab_path2 /var/opt/gitlab --password-file=/etc/rsync.password
receiving incremental file list
rsync: opendir "." (in gitlab_path2) failed: Permission denied (13)
IO error encountered -- skipping file deletion

sent 20 bytes  received 114 bytes  268.00 bytes/sec
total size is 0  speedup is 0.00
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1651) [generator=3.1.2]


服务端不要用 systemctl start rsync,


[root@vm2-cicd-v01 test]# rsync --daemon --config=/etc/rsyncd.conf                                         [root@vm2-cicd-v01 test]# netstat -anp | grep :973                        
tcp     0    0*        LISTEN    1597/rsync                                      tcp6    0    0 :::973         :::*           LISTEN    1597/rsync                           
[root@vm2-cicd-v01 test]# firewall-cmd --permanent --add-port=973/tcp                             
[root@vm2-cicd-v01 test]# firewall-cmd --reload  
[root@vm2-cicd-v01 test]# find / -name rsync

然后再看systemctl的rsync服务脚本,发现跟我手动rsync --daemon并没有区别??

[root@vm2-cicd-v01 test]# find / -name rsyncd.service
[root@vm2-cicd-v01 test]# vim /usr/lib/systemd/system/rsyncd.service
Description=fast remote file copy program daemon

ExecStart=/usr/bin/rsync --daemon --no-detach "$OPTIONS"


# systemctl status rsyncd
rsync: failed to open log-file /var/rsync/rsyncd.log: Permission denied (13)
# ls -Z /usr/bin/rsync
-rwxr-xr-x. root root system_u:object_r:rsync_exec_t:s0 /usr/bin/rsync
# ps auxZ | grep -v grep | grep rsync
system_u:system_r:rsync_t:s0    root     30320  0.0  0.0 114848  1216 ?        Ss   14:12   0:00 /usr/bin/rsync --daemon --no-detach

chcon -R -t public_content_t /var/rsync/ 没用!以为是改成public类型就可以但是没用,这个需要再研究,而且想到实际上同步文件的时候rsync_exec_t也没有那些配置的文件path的权限,所以也是有问题

chcon -t bin_t /usr/bin/rsync 

No space left

# rsync -avz [email protected]::gitlab_path2 /var/opt/gitlab --password-file=/etc/rsync.password
receiving incremental file list
rsync: write failed on "/var/opt/gitlab/postgresql/data/pg_wal/000000010000000000000002": No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(393) [receiver=3.1.2]


# df -h
Filesystem             Size  Used Avail Use% Mounted on
devtmpfs               7.8G     0  7.8G   0% /dev
tmpfs                  7.8G     0  7.8G   0% /dev/shm
tmpfs                  7.8G   41M  7.8G   1% /run
tmpfs                  7.8G     0  7.8G   0% /sys/fs/cgroup
/dev/mapper/rhel-root  148G  4.6G  144G   4% /
/dev/sda1              473M  164M  309M  35% /boot
/dev/mapper/rhel-var    19G   19G   13M 100% /var
/dev/mapper/rhel-home   19G  250M   19G   2% /home
tmpfs                  1.6G     0  1.6G   0% /run/user/0


# cat
rsync -avz [email protected]::gitlab_path1 /opt/gitlab --password-file=/etc/rsync.password >/dev/null 2>&1
rsync -avz [email protected]::gitlab_path2 /var/opt/gitlab --password-file=/etc/rsync.password >/dev/null 2>&1
rsync -avz [email protected]::gitlab_path3 /etc/gitlab --password-file=/etc/rsync.password >/dev/null 2>&1
rsync -avz [email protected]::gitlab_path4 /var/log/gitlab --password-file=/etc/rsync.password >/dev/null 2>&1
rsync -avz [email protected]::gitlab_path5 /run/gitlab --password-file=/etc/rsync.password >/dev/null 2>&1
rsync -avz [email protected]::gitlab_path6 /etc/ssh --password-file=/etc/rsync.password >/dev/null 2>&1


[root@vm2-cicd-gitlab-v01 ~]# du -sh /var/*
0       /var/adm
515M    /var/cache
0       /var/crash
8.0K    /var/db
95M     /var/lib
37M     /var/log
18G     /var/opt
36K     /var/spool

# du -sh /var/opt/gitlab/*
114M    /var/opt/gitlab/postgresql
18G     /var/opt/gitlab/prometheus
可以看到问题出在 /var/opt/gitlab/prometheus

[root@vm2-cicd-gitlab-v01 ~]# du -sh /var/opt/gitlab/prometheus/*
18G     /var/opt/gitlab/prometheus/data
8.0K    /var/opt/gitlab/prometheus/prometheus.yml
24K     /var/opt/gitlab/prometheus/rules

ls /var/opt/gitlab/prometheus/data/

In the simple example above, if there are files in the target destination that are not present at the source, they will be left alone and not touched. Sometimes you want to the target destination to become an exact copy of the source, aka "a mirror". To do that you want files on the target destination side to be deleted if they do not exist at the source. To do this you simply add the --delete option to rsync.

rsync -aZP --delete /source/dir/to/copy /target/dir/

# 同类技术