A new version of pg_stat_kcache is out, with support for Windows and other platforms, and more counters available.
What’s new
Version 2.1 of pg_stat_kcache has just been released.
The two main new features are:
- compatibility with platform without
getrusage()
support (such as Windows) - more fields of
getrusage()
are exposed
As I explained in a previous article, this extension is a wrapper on top of getrusage, that accumulates performance counters per normalized query. It was already giving some precious informations that allows a DBA to identify CPU-intensive queries, or compute a real hit-ratio for instance.
However, it was only available on platforms that have a native version
getrusage
, so Windows and some other platforms were not supported. But
fortunately, PostgreSQL does offer a basic support of
getrusage()
for those platforms. This infrastructure has been used in the version 2.1.0 of
pg_stat_kcache, which means that you can now use this extension on Windows
and all the other platforms that wasn’t supported previously. As this is a
limited support, only the user and system CPU metrics will be available, the
other fields will always be NULL.
This new version also exposes all the remaining fields of getrusage()
that
have a meaning when accumulated per query:
- soft page faults
- hard page faults
- swaps
- IPC messages sent and received
- signals received
- voluntary and involuntary context switches
Another change is to automatically detect the operating system’s clock tick. Otherwise, very short queries (faster than a clock tick) would be either detected as not consuming CPU time, or consuming CPU time from earlier short queries. For queries faster than 3 clock ticks, where imprecision is high, pg_stat_kcache will instead use the query duration as CPU user time, and won’t use anything as CPU system time.
Small example
Depending on your platform, some of those new counters aren’t maintained. On
GNU/Linux for instance , the swaps, IPC messages and signaled are unfortunately
not maintained, but those which are are still quite interesting. For instance,
let’s compare the context switches
if we run the same number of total
transaction but with either 2 or 80 concurrent connections on a 4-core laptop:
As expected, having 80 concurrent connections on a 4-core laptop is not the most efficient way to process 8000 transactions. The transactions latency is 44 times slower with 80 connections than with 2 connections. At the O/S level, we can see that with only 2 concurrent connections, we had only 8 involuntary context switches on all queries on the pgbench database, while there were 282, so 35 times more with 80 concurrent connections.
Those new metrics give a lot more information of what’s happening at the O/S level, on a per normalized query granularity, and will ease diagnostic of performance issues. Combined with PoWA, you’ll even be able to identify when any of those metrics have a different behavior!