This post appeared originally in our sysadvent series and has been moved here following the discontinuation of the sysadvent microsite

Lately I have been working a bit with the monitoring platform Zabbix, and the instance in question is backed by the PostgreSQL RDBMS.

SQL and time-series

Apart from data regarding such as hosts, services and checks, a significant amount of the data in the RDMS is historical time-series data aggregated for over a year back in time.

There are several data-stores optimised for time-series data to choose from, as SQL is not the best choice for this, but none of them integrates nicely with Zabbix out of the box.

To keep the database from growing out of proportions, Zabbix comes with a housekeeper functionality that scans through the dataset and removes older data that is past various expiry dates. Doing so will give you tables with many rows that are deleted, and new data will eventually be placed into these vacant rows. Which makes the housekeeper a known source for generating performance challenges, as your time-series tables will be bloated and unsorted on disk.

To amend this issue it is not sufficient to run VACUUM alone, as it will only clean out the dead rows and mark them as reusable. But running VACUUM FULL or CLUSTER which could, alleviate the issue, will take out exclusive locks on your tables. Which is hardly desirable.

Remove bloat and sort

Enter pg_repack. It is an extension to PostgreSQL that let you remove bloat from tables and their indexes, while working online and not taking out exclusive locks. Optionally it can also restore the physical order on disk.

One caveat is though that it requires about double the size of the target table and its indexes, and the target table must have a PRIMARY KEY, or at least a UNIQUE total index on a NOT NULL column.

Installation is normally only an apt or yum command away, followed by creating the extension:

psql -c "CREATE EXTENSION pg_repack" -d bloated_db

Similarly, it can be uninstalled with DROP EXTENSION pg_repack, followed by an appropriate apt or yum command.

The actual re-packing of a table can be initiated like this, for a table with a primary index:

/usr/pgsql-10/bin/pg_repack --table=your_table --jobs=20 bloated_db

Repacking of only the indexes of a table without a primary key:

/usr/pgsql-10/bin/pg_repack -x --table=your_other_table --jobs 20 bloated_db

Quoting the documentation, to perform a full-table repack, pg_repack will:

  1. Create a log table to record changes made to the original table.
  2. Add a trigger onto the original table, logging INSERTs, UPDATEs and DELETEs into our log table.
  3. Create a new table containing all the rows in the old table.
  4. Build indexes on this new table.
  5. Apply all changes which have accrued in the log table to the new table. 6p. Swap the tables, including indexes and toast tables, using the system catalogs.
  6. Drop the original table.

pg_repack will only hold an ACCESS EXCLUSIVE lock for a short period during initial setup (steps 1 and 2 above) and during the final swap-and-drop phase (steps 6 and 7). For the rest of its time, pg_repack only needs to hold an ACCESS SHARE lock on the original table, meaning INSERTs, UPDATEs, and DELETEs may proceed as usual.

Similarly, index only repacks will:

  1. Create new indexes on the table using CONCURRENTLY matching the definitions of the old indexes.
  2. Swap out the old for the new indexes in the catalogs.
  3. Drop the old indexes.

Creating indexes concurrently comes with a few caveats, please see the documentation for details.

Feeling adventurous?

A more modern approach to removing unused space from a table and optionally sort tuples according to a particular index is pg_squeeze. It’s just released for PostgreSQL 9.6 and in beta for 10, and uses transaction log files and logical decoding to capture possible data changes to the table being rebuilt.

I have not had the opportunity to test pg_squeeze yet. So, are you adventurous for Christmas?

Happy hacking!

Knut Ingvald Dietzel

Team Lead, Infrastructure Services at Redpill Linpro

Knut is an ex. Physicist in his 9th year at Redpill Linpro. He started out as a Systems Consultant and advanced to Team Lead, but still has time to dabble with FOSS components within fields like RDBMS, HTTP accelerators, virtualization, configuration management, etc.

Containerized Development Environment

Do you spend days or weeks setting up your development environment just the way you like it when you get a new computer? Is your home directory a mess of dotfiles and metadata that you’re reluctant to clean up just in case they do something useful? Do you avoid trying new versions of software because of the effort to roll back software and settings if the new version doesn’t work?

Take control over your local development environment with containerization and Dev-Env-as-Code!

... [continue reading]

Ansible-runner

Published on February 27, 2024

Portable Java shell scripts with Java 21

Published on February 21, 2024