Recently I was searching for a way to avoid CPU peaks during backup of a production postgresql database. Until then, we used pg_dump to create single dumps for each database in the postgresql cluster. Everytime the backup was triggered, CPU load went up and sometimes also blocked new connections to the database for about 10-20 minutes because of that. Since we didn't want to loose too much of the current data in a disaster scenario, we performed the backup multiple times a day. Doing that with an approx. 1 GB database seemed overkill to me.
Then I came across the concept of incremental backups using postgresql.
It allows continous backups by writing database operations to so called WAL files. Since rollover takes place after a configurable size (default: 16 MB), these files can be simply backed-up on filesystem level. When a crash occurs, WAL files can be used to roll-forward all changes that have been done since the last full-backup (base-backup) - also to a specific time using PITR (Point-In-Time-Recovery). So also when using incemental backups, you have to make a full-backup from time to time ;)! Using continous archiving via WAL files avoids the previous mentioned CPU peaks since database operations are recorded during normal operation and archived in small chunks.
Setting up WAL archiving in postgresql isn't really hard to achieve reading the documentation. But creating base-backups and WAL backups, keeping track of created backups and restoring isn't really a job you're willing to do manually. When you search for helper tools, you quickly come at least across two interesting options: barman and pg-rman. On the first look, they're doing almost the same things, but in in detail they're different. I decided to use barman, since it's using the concept of remote backups using SSH and already offers inofficial debian packages via postgresql apt repository.
You can think of barman as a tool on-top of the basic incremental backup/restore functions offered by postgresql mentioned earlier. It allows creating/restoring backups via cmdline in one shot and keeps track of created backups and retention policies. So whatever you wan to do, use the barman command and you're done.
Barman should be installed on a separate machine where the backup-environment or -catalog is kept. This keeps the backups and catalogs separated from the database sever that might be completely destroyed in worst-case scenarios. Continous backups are performed push-style, where the postgresql server will copy rolled-over WAL files to the backup server via rsync using SSH.
Once the files have reached the backup server, barman will periodically (via cron) archive those WAL files to its internal backup catalog. Optionally these files can be compressed and maintained using a retention policy configured globally or on db-host basis. In addition to the automatic WAL archiving, base-backups (full-backups) can be performed manually, simply by invoking the barman backup command. After then, all new arriving WAL files are related to this last created base-backup - until another one is created.
When using incremental backups via barman, one has to keep some things in mind that are different from using pg_dump:
- Currently, only complete backups of the whole postgresql cluster can be made. No single databases can be WAL archived.
- Restoring a backup always includes the complete cluster - not only some databases.
- Since the backups are copies on filesystem level, different backup/database versions should be avoided.
- A restore from a backup always contains the complete $PGDATA folder that can be specified via -D on postgresql start.
To install barman, you can simply use the well-documented step-by-step guide on the barman website. That helped me setting things up quite quickly. But there's one thing I couldn't understand while installing barman: why is an additional postgresql database connection from the backup server needed although there's already an SSH account configured? To avoid this, I created a little helper that you can read about in the post 'Using barman without additional postgresql connection' :).
All in all barman is a great tool for working with incremental backups and really makes live easier dealing with the different aspects of backup/restore using postgresql.
Are you using barman already? Did it work for you in a desaster-recovery situation? Did you use barman but switched to another, maybe better, alternative? Please let me know!