29Mar/135

Using barman to manage incremental postgresql backups

Recently I was searching for a way to avoid CPU peaks during backup of a production postgresql database. Until then, we used pg_dump to create single dumps for each database in the postgresql cluster. Everytime the backup was triggered, CPU load went up and sometimes also blocked new connections to the database for about 10-20 minutes because of that. Since we didn't want to loose too much of the current data in a disaster scenario, we performed the backup multiple times a day. Doing that with an approx. 1 GB database seemed overkill to me.

Then I came across the concept of incremental backups using postgresql.

It allows continous backups by writing database operations to so called WAL files. Since rollover takes place after a configurable size (default: 16 MB), these files can be simply backed-up on filesystem level. When a crash occurs, WAL files can be used to roll-forward all changes that have been done since the last full-backup (base-backup) - also to a specific time using PITR (Point-In-Time-Recovery). So also when using incemental backups, you have to make a full-backup from time to time ;)! Using continous archiving via WAL files avoids the previous mentioned CPU peaks since database operations are recorded during normal operation and archived in small chunks.

Setting up WAL archiving in postgresql isn't really hard to achieve reading the documentation. But creating base-backups and WAL backups, keeping track of created backups and restoring isn't really a job you're willing to do manually. When you search for helper tools, you quickly come at least across two interesting options: barman and pg-rman. On the first look, they're doing almost the same things, but in in detail they're different. I decided to use barman, since it's using the concept of remote backups using SSH and already offers inofficial debian packages via postgresql apt repository.

Barman

You can think of barman as a tool on-top of the basic incremental backup/restore functions offered by postgresql mentioned earlier. It allows creating/restoring backups via cmdline in one shot and keeps track of created backups and retention policies. So whatever you wan to do, use the barman command and you're done.

Barman should be installed on a separate machine where the backup-environment or -catalog is kept. This keeps the backups and catalogs separated from the database sever that might be completely destroyed in worst-case scenarios. Continous backups are performed push-style, where the postgresql server will copy rolled-over WAL files to the backup server via rsync using SSH.

Once the files have reached the backup server, barman will periodically (via cron) archive those WAL files to its internal backup catalog. Optionally these files can be compressed and maintained using a retention policy configured globally or on db-host basis. In addition to the automatic WAL archiving, base-backups (full-backups) can be performed manually, simply by invoking the barman backup command. After then, all new arriving WAL files are related to this last created base-backup - until another one is created.

When using incremental backups via barman, one has to keep some things in mind that are different from using pg_dump:

  • Currently, only complete backups of the whole postgresql cluster can be made. No single databases can be WAL archived.
  • Restoring a backup always includes the complete cluster - not only some databases.
  • Since the backups are copies on filesystem level, different backup/database versions should be avoided.
  • A restore from a backup always contains the complete $PGDATA folder that can be specified via -D on postgresql start.

To install barman, you can simply use the well-documented step-by-step guide on the barman website. That helped me setting things up quite quickly. But there's one thing I couldn't understand while installing barman: why is an additional postgresql database connection from the backup server needed although there's already an SSH account configured? To avoid this, I created a little helper that you can read about in the post 'Using barman without additional postgresql connection' :).

All in all barman is a great tool for working with incremental backups and really makes live easier dealing with the different aspects of backup/restore using postgresql.

Are you using barman already? Did it work for you in a desaster-recovery situation? Did you use barman but switched to another, maybe better, alternative? Please let me know!

Posted by Veit Guna

Tagged as: backup, barman, postgresql

Recent search terms:

  • Damian Soriano

    Hi!

    I am starting using barman and it looks like a really good alternative to pg_dump backups.

    Anyway am still not sure how to use WAL archives to restore a database. Suppose execute ‘barman backup main’ at
    08:16hs but after that another WAL archives arrived with modifications
    to the databases. After running ‘barman cron’ I have the following situation:

    barman@1449b59d4651:~$ barman list-backup main

    main 20140723T081622 – Wed Jul 23 08:16:27 2014 – Size: 36.9 MiB – WAL Size: 384.0 MiB
    main 20140723T071707 – Wed Jul 23 07:17:09 2014 – Size: 30.8 MiB – WAL Size: 32.0 MiB
    main 20140723T071347 – Wed Jul 23 07:13:50 2014 – Size: 18.8 MiB – WAL Size: 32.0 MiB

    If i recover using ‘barman recover main last
    /var/lib/barman/pg’ the backup from 08:16hs is restored, but the
    modifications that arrived after the backup are not there. Do you know how should I
    recover a database using the WAL archives that arrived after a backup is
    performed?

    This WAL archives part was not perfectly explained in the documentation as far as I am concern.

    • nightprogrammer

      Hi.

      Your recover command seems ok. Please note, that you have to configure postgresql to ship WAL files to the backup (barman) machine to catchup with the latest changes to the database. Otherwise you will only have the base backup without the changes that happened after the last backup. If you have already setup WAL archiving on postgres, keep in mind, that postgres ships WAL files in so called segments. Per default, this segments have a size of 16 MB. Postgres ships WAL files only if this limit is exceeded. That means if you only have very small or rare changes in the database it could happen, that the latest changes are lost because the WAL segment wasn’t finished and thus not shipped to the backup machine yet.
      You can also use ‘barman status’ command to check your WAL shipment or contact the barman mailing list to get additional help. I hope that helps.

  • Mariano Ruiz

    It’s supposed I must execute periodically the “barman cron” from command line to make an incremental backup, or I can automate this process ??

    • nightprogrammer

      The command ‘barman cron’ is supposed to be called from the linux crontab or similar schedulers. barman cron just performs some maintenance tasks like archiving, compressing or cleaning up old WALs due to retention. It has nothing todo with the backup itself. Of course you can call it manually from the commandline, but it should run periodically.

      • Mariano Ruiz

        Yes, I realized this after some tests. Thanks !