I recently migrated my Home Assistant from SQLite to PostgreSQL and found that the top Google hits gave me broken instructions. Installation and configuration of a PostgreSQL server is outside the scope of this post. I will assume that it’s acceptable with some hours without recording statistics. While I believe it should be possible to do a “hot” migration without data loss, this is outside the scope of this post. I assume you have some knowledge of shell commands and that you’re able to find the relevant configuration files both in Home Assistant and in PostgreSQL. I assume you have shell access and sudo access to the PostgreSQL server - but if rewriting the commands a bit it should be possible to complete the migration without.

Disclaimer

The blog post was written after doing the migration, that increases the risk of there being mistakes in the text. I have described quite some corner cases below, but been through only a few of them myself.

Motivations

I like SQLite and it worked fine for me. It’s just that I happened to have a PostgreSQL server running, and I happen to like PostgreSQL too :-) But here are some better reasons:

  • I was moving my HomeAssistant from one physical hardware to another. Doing it in two steps reduced the risks and made the rollback path easier.

  • I believe in separation of data and configuration. I like to have the configuration to be backed up in a git-repo without having it cluttered with data. For the database I have another backup regime.

  • Performance - PG is said to perform better when having massive amounts of old archived data. (It’s also said that TimescaleDB, a fork of PostgreSQL specially optimized for this kind of workload, performs even better).

  • Perhaps the recorded data no longer fits i.e. on a small SD-card on a raspberry pi, but you’re not quite ready to drop the data.

Other resources

I read through this before starting:

My migration job (and this blog post) is heavily based on the latter.

Preparations

Locate the database file and fix permissions

The SQLite database is located in a file with (as of 2023) file name home-assistant_v2.db. The location may vary dependent on your environment and installation. Check same directory as the configuration.yaml-file. Check ~homeassistant/.homeassistant. Perhaps it’s in /etc/homeassistant or /var/lib/hass. If nothing else works, then locate home-assistant_v2.db or find / -name 'home-assistant*.db'.

Give yourself read permissions to the file. Probably the file has owner and group homeassistant, and probably the mode is 0660, giving both the owner and the group full access to the file:

$ ls -al home-assistant_v2.db
-rw-rw---- 1 homeassistant homeassistant 9349550080 Mar 16 12:15 home-assistant_v2.db

To get read (and write) access to the file you may either add yourself to the homeassistant group (i.e. sudo usermod -aG homeassistant $LOGNAME or vi /etc/group, then log in again and run groups to verify you’re in the homeassistant group) or simply change the ownership (sudo chown $LOGNAME home-assistant_v2.db - homeassistant should anyway still have permissions through the group)

Ensure you have the right software installed

The recommended way to migrate the data from SQLite to PostgreSQL seems to be through the tool PgLoader. It seems to be available out-of-the-box at Debian and Ubuntu (apt-get install pgloader works on jammy at least), for EL it’s available through third-party repositories. On Archlinux, it’s in the AUR.

You may run this program from the server running HomeAssistant, on the server running PostgreSQL or on some third box. When optimizing for lowest gap in the recording, probably the best option is to run PgLoader from the PostgreSQL.

Do an initial Rsync of the database file (optional)

Ref above, if you’re running the PgLoader from the PostgreSQL server, you may want to do this:

# Obviously, at least the first line should be edited
DST_HOST=postgres.example.com
DST_DIR=/tmp
echo "Please verify that there is enough space on $DST_HOST:$DST_DIR"
du -sh home-assistant_v2.db
ssh $DST_HOST df -h $DST_DIR
echo "Doing the rsync"
rsync -v --progress --inplace --partial  home-assistant_v2.db $DST_HOST:$DST_DIR

This will take anything between a fraction of a second and several hours, YMMV. With --progress at least you’ll get an estimate. Leave this one running in the background while you continue with the next steps in the guide.

Since we’ve done this while HomeAssistant is having an open connection towards the SQLite database and actively recording data there, chances are that the file you’ve transferred is inconsistent (LVM, Btrfs, ZFS etc can be used for creating consistent snapshots - but that’s outside the scope of this article).

Check the events table

If the database was created under an old version of HomeAssistant, there may exist an extra obsoleted column events.created in the database, which again reportedly may cause problems. Check the schema:

sqlite3 home-assistant_v2.db -readonly -cmd '.schema events' < /dev/null | grep created

If it prints the created row definition, then you need to drop this column prior to the migration. While I believe it should be relatively safe to do so while HomeAssistant is running and recording, you may want to do it after you’ve copied the file and before PgLoader is run.

Create the new database

You need to connect to the PostgreSQL server as an admin user who can create databases and users. If you have shell and sudo-access to the PostgreSQL server, you may do it as the postgres superuser, like this (assuming pwgen exists):

cd /tmp
PASSWD=$(pwgen 32 1)
echo "CREATE USER homeassistant WITH PASSWORD '$PASSWD';
CREATE DATABASE homeassistant_db WITH OWNER homeassistant ENCODING 'utf8';" |
sudo -u postgres psql

Don’t close this terminal window, as it contains the database password.

Prepare the Home Assistant configuration

You will probably find configuration.yaml in the same directory as the SQLite file.

Make a backup of the configuration.yaml-file. I’d recommend using git, but a simple cp configuration.yaml configuration.yaml-$(date +%FT%H%M) will also do.

Check if you have a recorder section in configuration.yaml - and if not, add it. Mine looks like this:

recorder:
    auto_purge: false
    purge_keep_days: 4000
    db_url: postgresql:///homeassistant_db
    #db_url: !secret psql_string
    db_retry_wait: 15

I will consider tweaking auto_purge and purge_keep_days when my hard disk goes full or when the db performance gets too bad, as for now I see no reason to throw away data (digression: the best would probably be to fix some smart retention, reducing the stored sample rate for old data).

I’m not sure what the default for db_retry_wait is, but if PostgreSQL is down we want to wait some seconds (15s is a sane value) and try again.

Note db_url. In my case Home Assistant runs on the same server as PostgreSQL, runs as the homeassistant user and can communicate with the database using socket communication. Great, then I don’t even need a password configured, only the db name in the db_url. If you have a network (or docker) between your Home Assistant and the PostgreSQL, use the other one and put the password into a file secrets.yaml. Even with a local PostgreSQL, it may be nice for future-proofing:

PG_HOST=$DST_HOST # you may need to edit this line
sudo touch secrets.yaml # create it if it doesn't exist
ls -al secrets.yaml # check the permissions and ownership
sudo chown $LOGNAME secrets.yaml # give yourself write permissions
echo "psql_string: \"postgresql://homeassistant:$PASSWD@$PG_HOST/homeassistant_db\"" >> secrets.yaml
sudo chown homeassistant:homeassistant secrets.yaml # is this correct?  YMMV.
sudo chmod 0660 homeassistant:homeassistant secrets.yaml # is this correct?  YMMV.

Now that you have saved the password, you’re allowed to close the terminal window.

PostgreSQL access rules (pg_hba)

Postgres has an extra access level between the password auth and the firewall, it’s the pg_hba.conf file which may be located under /etc/postgresql/, /etc/postgresql/*/main/, /var/lib/postgresql/data, /var/lib/pgdata or possibly somewhere else, dependent on your OS distro and PostgreSQL installation package. You need to locate the file and edit it, adding one (or more) opening(s) for the homeassistant user. Explaining the pg_hba is outside the scope of this article. Probably you will be good by chucking in one (or both) of those two lines:

## Allows Home Assistant to be run on this server
## As user "homeassistant", connecting by unix socket:
local homeassistant_db homeassistant peer

## Allows Home Assistant to be run from a specific IP
## with or without encryption and connect with password
## (modify the IP address, obviously):
host homeassistant_db homeassistant 10.11.12.13 md5

(hostssl is more secure than host, as it will reject connection attempts without TLS encryption. In practice scram-sha256 rather than the insecure md5 will be required if the server is well configured)

While I would recommend against it, it’s also possible to use trust rather than md5, this will eliminate the need of password handling.

Firewall and listening interface

We’ve dealt with the db user, permissions, password and the pg_hba, but even if you’ve done everything correct, there is still the chance that Home Assist won’t be able to connect to the database. Let’s be a bit pro-active and verify the networking in advance rather than debugging problems later.

This section is hardly relevant if you’re using local connections over unix sockets, but even then there may be permission problems to the socket or problems finding the socket. Try sudo -u homeassistant psql homeassistant_db. If you can’t get in, the probability of Home Assistant getting in is rather small.

Use ssh or docker rsh to get close to your Home Assistant, and then try to connect to the PostgreSQL port. This used to be very easy in the olden days, using telnet $PG_HOST 5432. On modern operating systems and particularly in docker containers, it will more often than not give the error message telnet: command not found - so then it’s needed to search for other tools to do the job - or perhaps install the needed tools. There is nc which may be installed as ncat or netcat, but there are different implementations, options may vary, not all the implementations supports IPv6. You will at least need -v. The best is probably (if possible) to install psql. Bash itself also supports piping to/from special file name /dev/tcp/$PG_HOST/$PG_PORT, which may be used as a last resort.

PG_HOST=www.example.com
PG_PORT=5432

## With psql installed, you may also
## verify that the pg_hba and password is correct
psql -h $PG_HOST -p 5432 -U homeassistant homeassistant_db

## if psql isn't installed, we may try abusing other tools like curl or wget:
curl -v http://$PG_HOST:$PG_PORT/
wget -v http://$PG_HOST:$PG_PORT/

## last resort
echo "asdfsfd\nasdfwef" | cat > /dev/tcp/$PG_HOST/$PG_PORT
  • If any of those commands will give you a timeout, it probably means you have a firewall dropping packages.
  • “No route to host” typically means wrong IP-address, firewall refusing to route the package, private IP addresses, or general routing problem.
  • “Connection refused” most often means that PostgreSQL isn’t running, is listening to another port or even another network interface … but it may also be that the firewall is rejecting packages.
  • If it says anything like “connected”, “broken pipe”, “empty response from server”, or if the bash pipe returns relatively fast and without error messages, then the networking seems to work.

If you get “connection refused”, then on the PostgreSQL host, try ss -ln | grep 5432. A typical problem is that PostgreSQL only listens to the localhost interface - if so, try adding listen_addresses='*' to the configuration file (postgresql.conf is usually in the same directory as pg_hba.conf).

Do another Rsync (optional)

Did the first rsync complete already? If not, wait it out. If first rsync completed in few seconds, then skip this step.

Repeat the rsync:

rsync -v --progress --inplace --partial  home-assistant_v2.db $DST_HOST:$DST_DIR

Now only the changes are transferred - so it should be way faster, but YMMV. It needs to read the full file on both sides and create checksums, with a huge file, storage on slow spinning disks and/or slow CPU, this may still take significant time.

The second rsync run has three purposes; discovery of how long time the final sync may take, making the diff that needs to be synced smaller, and (possibly) warming up caches, so that the final sync will read data from caches rather than from slow disk.

Planning some downtime

How important is Home Assistant for you and your family? Do you survive with the system being down for a while? If you read carefully through the rest of the post and plan things well, it should be possible to do it almost without downtime on Home Assistant, but sensor data recorded while the database migration is going on will be lost.

I could tolerate hours with no running Home Assistant, so I did the most simple stupid - taking down Home Assistant and doing things step by step without stressing.

Final database rsync

If you’ve been following the steps for rsyncing the SQLite database, chances are that you have some inconsistency in the target file. For the sake of integrity we need to do the final sync without having more records written to the file. You may want to simply shut down Home Assistant at this point, but there is a service recorder.disable that can be run from Home Assistant (i.e. through Developer Tools/Services in the web-ui). Disable recording, and it should be possible to rsync a consistent snapshot of the SQLite file.

The procedure for doing the actual rsync is the same as above:

rsync -v --progress --inplace --partial  home-assistant_v2.db $DST_HOST:$DST_DIR

(recorder may be enabled again, but the recorded data will not be migrated to PostgreSQL).

Let Home Assistant create the DB schema

Allegedly there will be problems if extracting the DB schema from the SQLite dump and taking it into PostgreSQL.

By now your Home Assistant should be configured to use the database, the networking should work, and the authorization should also work. Restart Home Assistant, and it should connect to the database, discover that the database schema is missing, and create it.

Before restarting Home Assistant, get ready to monitor it. There is a log-file, usually available in the same directory as the configuration file. The log-file is rolled on every restart, but open a terminal window and do sudo -u homeassistant tail -F home-assistant.log in it (or sudo journalctl -f -u homeassistant if the log is in the journal. On my system it’s both ending up in a log file and in the journal).

You should also monitor the database. You may run sudo -u postgres watch "psql -c \"select * from pg_stat_activity where datname='homeassistant_db'\"" in one terminal window to see the connections and activity, and sudo -u postgres watch 'psql -c "\d"' homeassistant_db in another to see the tables being made.

Once the schema is made (shouldn’t take long - anything between milliseconds and some few seconds, depending on your setup), you should disable the recorder or shut down Home Assistant (or restart it with the old configuration pointing towards SQLite). The monitoring terminal windows may be left open, they will be useful also in the next steps.

If there are errors related to the database connection, obviously it’s needed to debug, fix, retry.

Migrate the data

Unfortunately, I don’t know much about PgLoader, and I’m not in a position to retry this operation now. It would have been nice doing some more research into it.

If there is a created-field in the events table, then it should be dropped. It seems to be possible, but not entirely trivial to do this in PgLoader. I did the simple stupid thing, dropping it from the SQLite file prior to migration, but if the database file is huge and old spinning disks are in use this will unfortunately take some time:

sqlite3 home-assistant_v2.db -cmd 'alter table events drop column created' < /dev/null

The simple way to use pgloader is to do pgloader sqlite://$DST_DIR/home-assistant_v2.db postgresql:///homeassistant_db - but we’d like to set some options during the migration, for that we need to create a load file - for instance, like this:

DST_DIR=/tmp
echo<<EOF>/tmp/ha.pgloader
load database
  from sqlite://$DST_DIR/home-assistant_v2.db
  into postgresql:///homeassistant_db
with data only, drop indexes, reset sequences, truncate, batch rows = 1000
SET work_mem to '32 MB', maintenance_work_mem to '64 MB';
EOF
sudo -u postgres pgloader /tmp/ha.pgloader

If you don’t have the SQLite file on the DB host, you will need to replace the postgresql://-URL to the one you have saved in your secrets.yaml file and remove the sudo -u postgres part.

I did some cargo-cult-copying from another blog post. Reading the doc now, I find:

  • data_only - obviously, copy only the data and not the schema. The documentation is a bit vague, will data_only cause the other flags (drop_indexes, reset_sequences, truncate) to be ignored? Will it try to overwrite existing schema if removing data_only?
  • drop_indexes may not as bad as it sounds - it is supposed to drop the indexes prior to syncing the data and then recreate them. This will again probably cause improved speed and indexes that are better balanced and less bloated.
  • reset_sequences - the doc says … “pgloader resets all the PostgreSQL sequences created to the current maximum value of the column they are attached to”. I had some problems with sequences, should probably have tried again without this option - but as I understand the description, it should do the right thing. Maybe --data-only is the real problem here?
  • truncate - obviously, truncate the tables prior to inserting things into them. May be needed to avoid conflicts with data recorded immediately after the schema was created. With data_only, I’m not sure this is respected at all.
  • I found no documentation on batch_rows. Probably a low number causes extra overhead. Probably the default was too low and 1000 is better.
  • work_mem and maintenance_work_mem is sent into PostgreSQL as session variables. It probably makes sense having higher values here than the configured default.

I ran into problems, when starting Home Assistant I got lots of duplicated key errors. More research should probably be done into it, but I decided to make a quick’n’dirty workaround on that part:

sudo -u postgres pg_dump  homeassistant_db |
grep SELECT | grep _seq |
perl -pe 's/, \d+, true/, 12345678, true/' |
sudo -u postgres psql homeassistant_db

In short, bump all sequences to some a value higher than the highest existing ID-number. 12345678 has proven to be good enough for me, but my database got accidentally truncated not so long ago - you may want to slap on an extra digit or two to be sure - or better, check your data. Probably the states table is the biggest. You may want to do a quick sudo -u postgres psql -c 'select max(state_id) from states' homeassistant_db, the states table may be the biggest table.

(Re)start Home Assistant

Start - or restart Home Assistant, and check if there are any database/recorder-related problems in the logs.

(If you had Home Assistant running but recorder disabled - perhaps it suffices to enable the recorder, but a full restart feels safer).

And then you should be good.

Look through the monitoring. Check that you may access old data. Check that new data is stored and can be graphed. Verify that there is movement in the sequences:

sudo -u postgres pg_dump  homeassistant_db |
grep SELECT | grep _seq | grep -v 12345678

Now you may climb (carefully) up on your office chair, raise a first in the air and yell: “mission completed!”

Tobias Brox

Senior Systems Consultant at Redpill Linpro

Tobias started working as a developer when he finished his degree at The University of Tromsø. He joined Redpill Linpro as a system administrator a decade ago, and have embraced working with our customers, and maintaining/improving our internal tools.

The irony of insecure security software

It can probably be understood from my previous blog post that if it was up to me, I’d avoid products like CrowdStrike - but every now and then I still have to install something like that. It’s not the idea of “security software” per se that I’m against, it’s the actual implementation of many of those products. This post lists up some properties that should be fulfilled for me to happy to install such a product.

Free and ... [continue reading]

Thoughts on the CrowdStrike Outage

Published on July 23, 2024

Alarms made right

Published on June 27, 2024