Data checksums
Added in PostgreSQL 9.3, data checksums can help to detect data corruption happening on the storage side.
Checksums are only enabled if the instance was setup using initdb
--data-checksums
(which isn’t the default behavior), or if activated
afterwards with the new
pg_checksums
tool also added in PostgreSQL
12.
When enabled, checksums are written each time a block is written to disk, and verified each time a block is read from disk (or from the operating system cache). If the checksum verification fails, an error is reported in the logs. If the block was read by a backend, the query will obviously fails, but if the block was read by a BASE_BACKUP operation (such as pg_basebackup), the command will continue its processing . While data checkums will only catch a subset of possible problems, they still have some values, especially if you don’t trust your storage reliability.
Up to PostgreSQL 11, any checksum validation error could only be found by looking into the logs, which clearly isn’t convenient if you want to monitor such error.
New counters available in pg_stat_database
To make checksum errors easier to monitor, and help users to react as soon as
such a problem occurs, PostgreSQL 12 adds new counters in the
pg_stat_database
view:
commit 6b9e875f7286d8535bff7955e5aa3602e188e436
Author: Magnus Hagander <magnus@hagander.net>
Date: Sat Mar 9 10:45:17 2019 -0800
Track block level checksum failures in pg_stat_database
This adds a column that counts how many checksum failures have occurred
on files belonging to a specific database. Both checksum failures
during normal backend processing and those created when a base backup
detects a checksum failure are counted.
Author: Magnus Hagander
Reviewed by: Julien Rouhaud
commit 77bd49adba4711b4497e7e39a5ec3a9812cbd52a
Author: Magnus Hagander <magnus@hagander.net>
Date: Fri Apr 12 14:04:50 2019 +0200
Show shared object statistics in pg_stat_database
This adds a row to the pg_stat_database view with datoid 0 and datname
NULL for those objects that are not in a database. This was added
particularly for checksums, but we were already tracking more satistics
for these objects, just not returning it.
Also add a checksum_last_failure column that holds the timestamptz of
the last checksum failure that occurred in a database (or in a
non-dataabase file), if any.
Author: Julien Rouhaud <rjuju123@gmail.com>
commit 252b707bc41cc9bf6c55c18d8cb302a6176b7e48
Author: Magnus Hagander <magnus@hagander.net>
Date: Wed Apr 17 13:51:48 2019 +0200
Return NULL for checksum failures if checksums are not enabled
Returning 0 could falsely indicate that there is no problem. NULL
correctly indicates that there is no information about potential
problems.
Also return 0 as numbackends instead of NULL for shared objects (as no
connection can be made to a shared object only).
Author: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Those counters will reflect checksum validation errors for both backend activity and BASE_BACKUP activity, per database.
The checksum_failures
column will show a cumulated number of errors, and the
checksum_last_failure
column will show the timestamp of the last checksum
failure on the database (NULL if no error ever happened).
To avoid any confusion (thanks to Robert Treat for pointing it), those two columns will always return NULL if data checksums aren’t enabled, so people won’t mistakenly think that data checksums are always successfully verified.
As a side effect, pg_stat_database
will also now show available statistics
for shared objects (such as the pg_database
table for instance), in a new row
with datid
valued to 0, and a NULL datname
. Those were always
accumulated, but weren’t displayed in any system view until now.
A dedicated check is also already
planned in
check_pgactivity!
A dedicated check is also already
available
in check_pgactivity!