Checksums are only enabled if the instance was setup using
--data-checksums (which isn’t the default behavior), or if activated
afterwards with the new
tool also added in PostgreSQL
When enabled, checksums are written each time a block is written to disk, and verified each time a block is read from disk (or from the operating system cache). If the checksum verification fails, an error is reported in the logs. If the block was read by a backend, the query will obviously fails, but if the block was read by a BASE_BACKUP operation (such as pg_basebackup), the command will continue its processing . While data checkums will only catch a subset of possible problems, they still have some values, especially if you don’t trust your storage reliability.
Up to PostgreSQL 11, any checksum validation error could only be found by looking into the logs, which clearly isn’t convenient if you want to monitor such error.
New counters available in pg_stat_database
To make checksum errors easier to monitor, and help users to react as soon as
such a problem occurs, PostgreSQL 12 adds new counters in the
commit 6b9e875f7286d8535bff7955e5aa3602e188e436 Author: Magnus Hagander <email@example.com> Date: Sat Mar 9 10:45:17 2019 -0800 Track block level checksum failures in pg_stat_database This adds a column that counts how many checksum failures have occurred on files belonging to a specific database. Both checksum failures during normal backend processing and those created when a base backup detects a checksum failure are counted. Author: Magnus Hagander Reviewed by: Julien Rouhaud
commit 77bd49adba4711b4497e7e39a5ec3a9812cbd52a Author: Magnus Hagander <firstname.lastname@example.org> Date: Fri Apr 12 14:04:50 2019 +0200 Show shared object statistics in pg_stat_database This adds a row to the pg_stat_database view with datoid 0 and datname NULL for those objects that are not in a database. This was added particularly for checksums, but we were already tracking more satistics for these objects, just not returning it. Also add a checksum_last_failure column that holds the timestamptz of the last checksum failure that occurred in a database (or in a non-dataabase file), if any. Author: Julien Rouhaud <email@example.com>
commit 252b707bc41cc9bf6c55c18d8cb302a6176b7e48 Author: Magnus Hagander <firstname.lastname@example.org> Date: Wed Apr 17 13:51:48 2019 +0200 Return NULL for checksum failures if checksums are not enabled Returning 0 could falsely indicate that there is no problem. NULL correctly indicates that there is no information about potential problems. Also return 0 as numbackends instead of NULL for shared objects (as no connection can be made to a shared object only). Author: Julien Rouhaud <email@example.com> Reviewed-by: Robert Treat <firstname.lastname@example.org>
Those counters will reflect checksum validation errors for both backend activity and BASE_BACKUP activity, per database.
checksum_failures column will show a cumulated number of errors, and the
checksum_last_failure column will show the timestamp of the last checksum
failure on the database (NULL if no error ever happened).
To avoid any confusion (thanks to Robert Treat for pointing it), those two columns will always return NULL if data checksums aren’t enabled, so people won’t mistakenly think that data checksums are always successfully verified.
As a side effect,
pg_stat_database will also now show available statistics
for shared objects (such as the
pg_database table for instance), in a new row
datid valued to 0, and a NULL
datname. Those were always
accumulated, but weren’t displayed in any system view until now.