<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
<title type="text">rjuju's home</title>
<generator uri="https://github.com/jekyll/jekyll">Jekyll</generator>
<link rel="self" type="application/atom+xml" href="https://rjuju.github.io/feed.xml" />
<link rel="alternate" type="text/html" href="https://rjuju.github.io" />
<link>https://rjuju.github.io</link>
<updated>2024-08-02T23:53:42+00:00</updated>
<id>https://rjuju.github.io/</id>
<author>
  <name>Julien Rouhaud</name>
  <uri>https://rjuju.github.io/postgresql/</uri>
  
</author>


<entry>
  <title type="html"><![CDATA[Extracting SQL from WAL? (part 2)]]></title>
  <link rel="alternate" type="text/html" href="https://rjuju.github.io/postgresql/2023/12/20/extract-sql-from-wal-part2.html" />
  <id>https://rjuju.github.io/postgresql/2023/12/20/extract-sql-from-wal-part2</id>
  <published>2023-12-20T03:04:10+00:00</published>
  <updated>2023-12-20T03:04:10+00:00</updated>
  <author>
    <name>Julien Rouhaud</name>
    <uri>https://rjuju.github.io</uri>
    
  </author>
  <content type="html">
    &lt;p&gt;In the &lt;a href=&quot;/postgresql/2023/12/06/extract-sql-from-wal.html&quot;&gt;previous article&lt;/a&gt; of this series, we saw how to extract
WAL records related to the exact SQL commands we want, INSERTs on heap tables,
and what the structure of those records was.  In this article we will focus on
the heap specific information contained in those records and how to extract SQL
queries from them.&lt;/p&gt;

&lt;h3 id=&quot;insert-data&quot;&gt;INSERT data&lt;/h3&gt;

&lt;p&gt;At the end of the &lt;a href=&quot;/postgresql/2023/12/06/extract-sql-from-wal.html&quot;&gt;previous article&lt;/a&gt;, we could locate the various
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;xl_heap_insert&lt;/code&gt; records from the WAL stream.  From there, we extracted some
metadata about the file’s physical location (tablespace oid, database oid and
relation filenode among other things) and the data that was inserted itself.&lt;/p&gt;

&lt;p&gt;As a reminder, here’s an extract of the code responsible for generating the
WAL records for an INSERT, in the &lt;a href=&quot;https://github.com/postgres/postgres/blob/master/src/backend/access/heap/heapam.c&quot;&gt;heap_insert()
function&lt;/a&gt;,
focusing on the interesting data:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;heap_insert&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Relation&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;relation&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;HeapTuple&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tup&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CommandId&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
			&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BulkInsertState&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bistate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;[...]&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;xl_heap_header&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;xlhdr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;[...]&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;xlhdr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t_infomask2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;heaptup&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t_data&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t_infomask2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;xlhdr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t_infomask&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;heaptup&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t_data&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t_infomask&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;xlhdr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t_hoff&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;heaptup&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t_data&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t_hoff&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;[...]&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;XLogRegisterBuffer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;buffer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;REGBUF_STANDARD&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bufflags&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;XLogRegisterBufData&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;xlhdr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SizeOfHeapHeader&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
		&lt;span class=&quot;cm&quot;&gt;/* PG73FORMAT: write bitmap [+ padding] [+ oid] + data */&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;XLogRegisterBufData&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
							&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;heaptup&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t_data&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SizeofHeapTupleHeader&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
							&lt;span class=&quot;n&quot;&gt;heaptup&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t_len&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SizeofHeapTupleHeader&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;[...]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;2 entries are inserted: an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;xl_heap_header&lt;/code&gt; which contains some metadata about
the tuple, extracted from the &lt;em&gt;tuple header&lt;/em&gt;, and the data part of a
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;HeapTuple&lt;/code&gt;.  Let’s look at those in details.&lt;/p&gt;

&lt;h3 id=&quot;page-layout&quot;&gt;Page layout&lt;/h3&gt;

&lt;p&gt;First of all, let’s quickly see how postgres stores tables and indexes on disk.
I will only cover those basics that will be helpful for the rest of the
article.  If you want to dig more into this topic, there are a tons of resource
available.  You can refer to &lt;a href=&quot;https://github.com/postgres/postgres/blob/master/src/include/storage/bufpage.h.&quot;&gt;this entry point in the
code&lt;/a&gt;,
and I otherwise recommend looking at &lt;a href=&quot;https://www.interdb.jp/pg/pgsql01.html#_1.3.&quot;&gt;the section about it in “The internals of
postgres” website&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A good general introduction is &lt;a href=&quot;https://www.postgresql.org/docs/current/storage-page-layout.html&quot;&gt;the
documentation&lt;/a&gt;,
which comes with a diagram of the layout that I include here:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/images/page_layout.png&quot;&gt;&lt;img src=&quot;/images/page_layout.png&quot; alt=&quot;Physical page layout, from the offical postgres
documentation&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Each tuple and index piece of data that postgres stores on disk is stored into
a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Page&lt;/code&gt;, which is by default 8kB.  Each page starts with a header that
contains some metadata about the page and ends with an optional “special area”,
which can contain additional information specific to the component of postgres
that will use this page.&lt;/p&gt;

&lt;p&gt;In between is the actual data.  The beginning of the data part is an array of
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ItemId&lt;/code&gt;, in ascending order, and the end of the data part are the items
themselves (which will be the tuples in case of heap table pages), stored in
the reverse order from the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ItemId&lt;/code&gt;.  Unless the page is totally full, there
will be an empty space between the last &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ItemId&lt;/code&gt; and the first item (the
pd_lower and pd_upper offset in the Page metadata).&lt;/p&gt;

&lt;p&gt;Here’s the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ItemId&lt;/code&gt; definition:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;typedef&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ItemIdData&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;kt&quot;&gt;unsigned&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lp_off&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;15&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;  &lt;span class=&quot;cm&quot;&gt;/* offset to tuple (from start of page) */&lt;/span&gt;
		 &lt;span class=&quot;nl&quot;&gt;lp_flags:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;cm&quot;&gt;/* state of line pointer, see below */&lt;/span&gt;
		 &lt;span class=&quot;nl&quot;&gt;lp_len:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;15&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;  &lt;span class=&quot;cm&quot;&gt;/* byte length of tuple */&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ItemIdData&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;As you can see it holds the location of the item in the page, minimal metadata
and the length of the item.&lt;/p&gt;

&lt;h3 id=&quot;heaptuple&quot;&gt;HeapTuple&lt;/h3&gt;

&lt;p&gt;The largest part stored in the record is the tuple itself.  As the historic and
default access method to store tuple is called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;heap&lt;/code&gt;, the struct that holds
the tuple is called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;HeapTuple&lt;/code&gt;.  Any custom &lt;strong&gt;Table Access Method&lt;/strong&gt; can use a
different struct to store what it needs for its specific implementation, but it
will then also use a custom resource manager to generate specific WAL records.&lt;/p&gt;

&lt;p&gt;Here’s the &lt;a href=&quot;https://github.com/postgres/postgres/blob/master/src/include/access/htup.h&quot;&gt;definition of a
HeapTuple&lt;/a&gt;:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;typedef&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;HeapTupleData&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;uint32&lt;/span&gt;		&lt;span class=&quot;n&quot;&gt;t_len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;		&lt;span class=&quot;cm&quot;&gt;/* length of *t_data */&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;ItemPointerData&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t_self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;		&lt;span class=&quot;cm&quot;&gt;/* SelfItemPointer */&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;Oid&lt;/span&gt;		&lt;span class=&quot;n&quot;&gt;t_tableOid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;	&lt;span class=&quot;cm&quot;&gt;/* table the tuple came from */&lt;/span&gt;
&lt;span class=&quot;cp&quot;&gt;#define FIELDNO_HEAPTUPLEDATA_DATA 3
&lt;/span&gt;	&lt;span class=&quot;n&quot;&gt;HeapTupleHeader&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t_data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;		&lt;span class=&quot;cm&quot;&gt;/* -&amp;gt; tuple header and data */&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;HeapTupleData&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It starts with some metadata, which isn’t stored on disk but generated or
retrieved from somewhere else when the struct is read from disk.  Indeed, there
wouldn’t be much value storing the relation’s oid for each tuple on disk.  The
length of the tuple is stored on disk, as it’s a necessary piece of
information, and is retrieved from the associated &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ItemId&lt;/code&gt; the we saw just
before.&lt;/p&gt;

&lt;p&gt;After that follows the “real” data, which is what is stored in the &lt;strong&gt;item&lt;/strong&gt;
part of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Page&lt;/code&gt;.  It’s again split in 2 parts: the tuple header, which I
will cover a bit later, and the tuple data.&lt;/p&gt;

&lt;p&gt;The tuple data is the physical on-disk representation of the tuple.  It was
designed to be as space efficient as possible, so accessing individual fields
is a bit complex, and CPU intensive.  Let’s the most important part of this
design.  First, the tuple data is &lt;a href=&quot;https://github.com/postgres/postgres/blob/master/src/include/access/htup_details.h&quot;&gt;defined like
that&lt;/a&gt;:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;HeapTupleHeaderData&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;[...]&lt;/span&gt;
	&lt;span class=&quot;cm&quot;&gt;/* ^ - 23 bytes - ^ */&lt;/span&gt;

&lt;span class=&quot;cp&quot;&gt;#define FIELDNO_HEAPTUPLEHEADERDATA_BITS 5
&lt;/span&gt;	&lt;span class=&quot;n&quot;&gt;bits8&lt;/span&gt;		&lt;span class=&quot;n&quot;&gt;t_bits&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FLEXIBLE_ARRAY_MEMBER&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;];&lt;/span&gt;	&lt;span class=&quot;cm&quot;&gt;/* bitmap of NULLs */&lt;/span&gt;

	&lt;span class=&quot;cm&quot;&gt;/* MORE DATA FOLLOWS AT END OF STRUCT */&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You probably know or heard that in postgres, NULL attributes don’t use any
storage.  Indeed, if an attribute is NULL there won’t be anything in the “data
section”, and the bit for its attribute number in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;t_bit&lt;/code&gt; bitmap will be
set.&lt;/p&gt;

&lt;p&gt;Then, a lot of data types have a variable size (which is internally referred as
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;varlena&lt;/code&gt;).  So, to save space postgres doesn’t store the offset of each
attributes in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;HeapTuple&lt;/code&gt; and just stores them next to each other
(according to the datatype alignment rules) in a big chunk of memory.&lt;/p&gt;

&lt;p&gt;This is indeed efficient, but unless your tuple only contains non-null
fixed-sized attribute, the only way to access a specific attribute is to read
all the previous ones, skip the NULL attribute and compute the position of the
next one reading the length of variable datatype.  This process is called
&lt;strong&gt;tuple deforming&lt;/strong&gt;, it takes a tuple in input and outputs two arrays: one with
the datums and one with the null references, all indexed by the attribute
number (0 based).  The opposite operation (transform a tuple of datum and a
tuple of nulls in a tuple) is unsurprisingly called &lt;strong&gt;tuple forming&lt;/strong&gt;.  If you
want to read a bit more about those operations, the underlying functions are
called &lt;a href=&quot;https://github.com/postgres/postgres/blob/master/src/backend/access/common/heaptuple.c&quot;&gt;heap_deform_tuple() and
heap_form_tuple()&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Note that tuple deforming is one of the operations that can be
&lt;a href=&quot;https://www.postgresql.org/docs/current/jit.html&quot;&gt;JITted&lt;/a&gt;, and there are some
optimisations on the tuple deforming operation.  Postgres supports “partial”
deforming and will avoid deforming the full tuple when possible, stopping at
the last attribute that the query is referencing, and will cache the offset of
the latest attribute that has been deformed.  But that can only help to some
extent, so it’s always a good idea to mark columns as NOT NULL when possible,
put all the columns with fixed-length attributes at the beginning of the tuples
(with the NOT NULL first), ideally grouped by alignment size to avoid wasting a
few bits, and put the most frequently accessed columns of variable length
datatype next.  All of that will help speeding up tuple deforming as much as
possible.&lt;/p&gt;

&lt;h4 id=&quot;tuple-header&quot;&gt;Tuple header&lt;/h4&gt;

&lt;p&gt;The first part of the stored data is an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;xl_heap_header&lt;/code&gt; struct.  It’s just a
shorter version of the real tuple header that only contains some part of it, the
rest of the header being available elsewhere in the WAL record or just not
needed otherwise.  Doing it this way can save a few bytes for each insert in
the WAL, which is always a good thing.  Its definition is:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;typedef&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;xl_heap_header&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;uint16&lt;/span&gt;		&lt;span class=&quot;n&quot;&gt;t_infomask2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;uint16&lt;/span&gt;		&lt;span class=&quot;n&quot;&gt;t_infomask&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;uint8&lt;/span&gt;		&lt;span class=&quot;n&quot;&gt;t_hoff&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;xl_heap_header&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;t_infomask2&lt;/em&gt; and &lt;em&gt;t_infomask2&lt;/em&gt; are two bitmaps that contain information about
the tuple.  You may have heard about &lt;a href=&quot;https://wiki.postgresql.org/wiki/Hint_Bits&quot;&gt;hint
bits&lt;/a&gt;, those two fields contains
the tuple-level hint bits.&lt;/p&gt;

&lt;p&gt;Let’s look at their details
&lt;a href=&quot;https://github.com/postgres/postgres/blob/master/src/include/access/htup_details.h&quot;&gt;htup_details.c&lt;/a&gt;&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;HeapTupleHeaderData&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;[...]&lt;/span&gt;
	&lt;span class=&quot;cm&quot;&gt;/* Fields below here must match MinimalTupleData! */&lt;/span&gt;

&lt;span class=&quot;cp&quot;&gt;#define FIELDNO_HEAPTUPLEHEADERDATA_INFOMASK2 2
&lt;/span&gt;	&lt;span class=&quot;n&quot;&gt;uint16&lt;/span&gt;		&lt;span class=&quot;n&quot;&gt;t_infomask2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;	&lt;span class=&quot;cm&quot;&gt;/* number of attributes + various flags */&lt;/span&gt;

&lt;span class=&quot;cp&quot;&gt;#define FIELDNO_HEAPTUPLEHEADERDATA_INFOMASK 3
&lt;/span&gt;	&lt;span class=&quot;n&quot;&gt;uint16&lt;/span&gt;		&lt;span class=&quot;n&quot;&gt;t_infomask&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;		&lt;span class=&quot;cm&quot;&gt;/* various flag bits, see below */&lt;/span&gt;

&lt;span class=&quot;cp&quot;&gt;#define FIELDNO_HEAPTUPLEHEADERDATA_HOFF 4
&lt;/span&gt;	&lt;span class=&quot;n&quot;&gt;uint8&lt;/span&gt;		&lt;span class=&quot;n&quot;&gt;t_hoff&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;			&lt;span class=&quot;cm&quot;&gt;/* sizeof header incl. bitmap, padding */&lt;/span&gt;

	&lt;span class=&quot;cm&quot;&gt;/* ^ - 23 bytes - ^ */&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;[...]&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

 &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;information&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stored&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t_infomask2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;
 &lt;span class=&quot;err&quot;&gt;*/&lt;/span&gt;
&lt;span class=&quot;cp&quot;&gt;#define HEAP_NATTS_MASK			0x07FF	&lt;/span&gt;&lt;span class=&quot;cm&quot;&gt;/* 11 bits for number of attributes */&lt;/span&gt;&lt;span class=&quot;cp&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;cm&quot;&gt;/* bits 0x1800 are available */&lt;/span&gt;
&lt;span class=&quot;cp&quot;&gt;#define HEAP_KEYS_UPDATED		0x2000	&lt;/span&gt;&lt;span class=&quot;cm&quot;&gt;/* tuple was updated and key cols
										 * modified, or tuple deleted */&lt;/span&gt;&lt;span class=&quot;cp&quot;&gt;
#define HEAP_HOT_UPDATED		0x4000	&lt;/span&gt;&lt;span class=&quot;cm&quot;&gt;/* tuple was HOT-updated */&lt;/span&gt;&lt;span class=&quot;cp&quot;&gt;
#define HEAP_ONLY_TUPLE			0x8000	&lt;/span&gt;&lt;span class=&quot;cm&quot;&gt;/* this is heap-only tuple */&lt;/span&gt;&lt;span class=&quot;cp&quot;&gt;
&lt;/span&gt;
&lt;span class=&quot;cp&quot;&gt;#define HEAP2_XACT_MASK			0xE000	&lt;/span&gt;&lt;span class=&quot;cm&quot;&gt;/* visibility-related bits */&lt;/span&gt;&lt;span class=&quot;cp&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[...]&lt;/span&gt;
 &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;information&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stored&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t_infomask&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;
 &lt;span class=&quot;err&quot;&gt;*/&lt;/span&gt;
&lt;span class=&quot;cp&quot;&gt;#define HEAP_HASNULL			0x0001	&lt;/span&gt;&lt;span class=&quot;cm&quot;&gt;/* has null attribute(s) */&lt;/span&gt;&lt;span class=&quot;cp&quot;&gt;
#define HEAP_HASVARWIDTH		0x0002	&lt;/span&gt;&lt;span class=&quot;cm&quot;&gt;/* has variable-width attribute(s) */&lt;/span&gt;&lt;span class=&quot;cp&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[...]&lt;/span&gt;
&lt;span class=&quot;cp&quot;&gt;#define HEAP_XMIN_COMMITTED		0x0100	&lt;/span&gt;&lt;span class=&quot;cm&quot;&gt;/* t_xmin committed */&lt;/span&gt;&lt;span class=&quot;cp&quot;&gt;
#define HEAP_XMIN_INVALID		0x0200	&lt;/span&gt;&lt;span class=&quot;cm&quot;&gt;/* t_xmin invalid/aborted */&lt;/span&gt;&lt;span class=&quot;cp&quot;&gt;
#define HEAP_XMIN_FROZEN		(HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID)
#define HEAP_XMAX_COMMITTED		0x0400	&lt;/span&gt;&lt;span class=&quot;cm&quot;&gt;/* t_xmax committed */&lt;/span&gt;&lt;span class=&quot;cp&quot;&gt;
#define HEAP_XMAX_INVALID		0x0800	&lt;/span&gt;&lt;span class=&quot;cm&quot;&gt;/* t_xmax invalid/aborted */&lt;/span&gt;&lt;span class=&quot;cp&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[...]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We can see a few bits useful for the &lt;strong&gt;tuple deforming&lt;/strong&gt;.  For instance, we
see that 11 bits of &lt;em&gt;t_infomask2&lt;/em&gt; are used to store the actual number of
attributes stored in this tuple.  Adding a new column in a table doesn’t always
require a full table rewrite, and in that case those bits are critical to know
when to stop looking for additional attributes when accessing tuples stored
before the column was added.  There’s also information on whether the tuple
contains any NULL or variable-length datatype attribute.  The rest of the hint
bits are a clever use of the available space to handle various SQL operations,
MVCC rules, HOT updates and other low level optimisations.&lt;/p&gt;

&lt;h3 id=&quot;tuple-descriptors&quot;&gt;Tuple descriptors&lt;/h3&gt;

&lt;p&gt;Now that we covered some internals of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;HeapTuple&lt;/code&gt;, it seems much easier to
reach our goal: transform the INSERT WAL records into plain SQL statements.  We
know that we just have to &lt;em&gt;deform&lt;/em&gt; the tuples to retrieve the values and the
NULL attributes, generating the SQL statements around isn’t hard.  But here
comes the second reason why we need a proper data directory to do so, and why
the lack of DDL is important.&lt;/p&gt;

&lt;p&gt;As you probably guessed by now, one critical piece of information needed for
the &lt;em&gt;tuple deforming&lt;/em&gt; operation is the table structure declaration.  Indeed,
the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;HeapTuple&lt;/code&gt; is just a big chunk of memory, and without the list of columns,
data types, and the types details, it’s impossible to interpret those.  If your
model doesn’t change too much it’s probably possible to do without and instead
generate some kind of mapping manually based on what you know about the history
of the instance.  Be careful if you go this way, any discrepancy between the
original and generated data types can lead to bogus output in the best case, or
crashing your whole instance.  But in my case I had the guarantee that no DDL
happened since the incident, and the other data directory available so I could
just rely on it.&lt;/p&gt;

&lt;p&gt;Postgres handles the table structure declaration using another struct, called
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TupleDesc&lt;/code&gt;, for &lt;em&gt;tuple descriptor&lt;/em&gt;.  Its definition is:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;typedef&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TupleDescData&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt;	     &lt;span class=&quot;n&quot;&gt;natts&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;	&lt;span class=&quot;cm&quot;&gt;/* number of attributes in the tuple */&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;Oid&lt;/span&gt;	     &lt;span class=&quot;n&quot;&gt;tdtypeid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;	&lt;span class=&quot;cm&quot;&gt;/* composite type ID for tuple type */&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;int32&lt;/span&gt;	     &lt;span class=&quot;n&quot;&gt;tdtypmod&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;	&lt;span class=&quot;cm&quot;&gt;/* typmod for tuple type */&lt;/span&gt;
	&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt;	     &lt;span class=&quot;n&quot;&gt;tdrefcount&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;span class=&quot;cm&quot;&gt;/* reference count, or -1 if not counting */&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;TupleConstr&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;constr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;	&lt;span class=&quot;cm&quot;&gt;/* constraints, or NULL if none */&lt;/span&gt;
	&lt;span class=&quot;cm&quot;&gt;/* attrs[N] is the description of Attribute Number N+1 */&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;FormData_pg_attribute&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;attrs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FLEXIBLE_ARRAY_MEMBER&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;];&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TupleDescData&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In our case the most interesting members are the number of attributes (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;natts&lt;/code&gt;)
and the array of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pg_attribute&lt;/code&gt; records (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;attrs&lt;/code&gt;).  Those are also useful for
the SQL generation part, as we can retrieve the columns from it.  Note also
that postgres will generate a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TupleDesc&lt;/code&gt; automatically when you internally
open a relation.&lt;/p&gt;

&lt;p&gt;Let’s recapitulate.  We have the record data, the filename contains the
physical file location information that we can use to retrieve the actual
relation, we know how to get the tuple descriptor for this relation and we can
use it to deform the tuple and get the values from it.  We have &lt;em&gt;almost&lt;/em&gt;
everything we need to generate the SQL queries.&lt;/p&gt;

&lt;p&gt;The only remaining detail is that the values we get from the tuple deforming
operation are in their physical representation, and we need to emit their
textual representation.  Again, that’s not a problem as each data type has a
dedicated function for that, called &lt;strong&gt;type output function&lt;/strong&gt;, available in
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pg_type.typoutput&lt;/code&gt;.&lt;/p&gt;

&lt;h3 id=&quot;extracting-sql-from-the-insert-records&quot;&gt;Extracting SQL from the INSERT records&lt;/h3&gt;

&lt;p&gt;Now is time for the fun part where we just need to put everything together to
finish the project!&lt;/p&gt;

&lt;p&gt;I chose to write it as an extension to be able to add and remove it easily from
a production server.  I also chose to minimize the amount of C code and rely on
plpgsql functions when possible.  It’s faster to write and plpgsql is also way
safer.&lt;/p&gt;

&lt;p&gt;I only wrote a single &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pg_decode_record()&lt;/code&gt; C function, that takes as input a
record as a bytea, the tablespace oid and the relation filenode and emits the
underlying SQL query.  I wrote an extra &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pg_decode_all_records()&lt;/code&gt; function in
plpgsql that uses existing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pg_ls_dir()&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pg_read_binary_file()&lt;/code&gt; to
retrieve the files and record, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;split_part()&lt;/code&gt; to extract the metadata from
the filename.&lt;/p&gt;

&lt;p&gt;I’m &lt;a href=&quot;/assets/patch/pg_decode_record.tgz&quot;&gt;attaching the resulting extension to this
article&lt;/a&gt; so you can see the whole
implementation and adapt it if needed, and will just quickly describe the main
parts here as we already covered the underlying elements.  I’m also only
showing here a simplified version to avoid too many implementation details.&lt;/p&gt;

&lt;p&gt;First, I look for a matching relation oid in the pg_class catalog for the given
tablespace and relfilenode, open the found relation with the weakest lock
possible, make a copy of the tuple descriptor and start generating the SQL
query with the qualified relation name.  As for normal application, you need to
make sure that the identifiers are properly quoted to generate working queries:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;PGDLLEXPORT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Datum&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;pg_decode_record&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PG_FUNCTION_ARGS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;bytea&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;record&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PG_GETARG_BYTEA_PP&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Oid&lt;/span&gt;	&lt;span class=&quot;n&quot;&gt;spc&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PG_GETARG_OID&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Oid&lt;/span&gt;	&lt;span class=&quot;n&quot;&gt;relfilenode&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PG_GETARG_OID&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

    &lt;span class=&quot;cm&quot;&gt;/* Get the relation oid from the tablespace oid and relfilenode */&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;relid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;get_spc_relnumber_relid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;spcOid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;relNumber&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;relation&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;table_open&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;relid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;AccessShareLock&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;tupdesc&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CreateTupleDescCopy&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;RelationGetDescr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;relation&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;

    &lt;span class=&quot;cm&quot;&gt;/* Start generating the SQL query */&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;initStringInfo&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;buf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;appendStringInfo&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;buf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;INSERT INTO %s.%s&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    		 &lt;span class=&quot;n&quot;&gt;quote_identifier&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_namespace_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;RelationGetNamespace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;relation&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))),&lt;/span&gt;
    		 &lt;span class=&quot;n&quot;&gt;quote_identifier&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;RelationGetRelationName&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;relation&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The next part extracts the data from the record and generate a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;HeapTuple&lt;/code&gt; with
just enough information to be correctly deformed:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;    &lt;span class=&quot;cm&quot;&gt;/* mimic heap_xlog_insert */&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;data&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;VARDATA&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;record&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;datalen&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;VARSIZE_ANY&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;record&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;[...]&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;htup&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tbuf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hdr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;[...]&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;htup&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t_hoff&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;xlhdr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t_hoff&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;cm&quot;&gt;/* build a fake tuple with the bare minimum to deform it */&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;tuple&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;HeapTuple&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;palloc0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;HEAPTUPLESIZE&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;VARSIZE_ANY&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;record&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;tuple&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t_data&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;htup&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;tuple&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t_len&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;VARSIZE_ANY&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;record&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;ItemPointerSetInvalid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tuple&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t_self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;tuple&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t_tableOid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;relid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;For the next step, we just need to allocate the 2 arrays needed for the
deforming and call &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;heap_deform_tuple()&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;    &lt;span class=&quot;n&quot;&gt;values&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;palloc0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;sizeof&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Datum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tupdesc&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;natts&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;isnull&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;palloc0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;sizeof&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bool&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tupdesc&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;natts&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;heap_deform_tuple&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tuple&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tupdesc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;values&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;isnull&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now that we have all the elements, we just need to iterate over the list of
columns in the tuple descriptor, output a NULL if needed, otherwise find the
type output function, call it for our value, and output it in the query after
escaping it:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;    &lt;span class=&quot;cm&quot;&gt;/* append the values */&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;appendStringInfoString&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;buf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot; VALUES (&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tupdesc&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;natts&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    	&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt;	   &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;value&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    	&lt;span class=&quot;n&quot;&gt;Oid&lt;/span&gt;			&lt;span class=&quot;n&quot;&gt;typoutput&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    	&lt;span class=&quot;n&quot;&gt;bool&lt;/span&gt;		&lt;span class=&quot;n&quot;&gt;typisvarlena&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

    	&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    		&lt;span class=&quot;n&quot;&gt;appendStringInfoString&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;buf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;, &quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

    	&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;isnull&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;
    	&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    		&lt;span class=&quot;n&quot;&gt;appendStringInfoString&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;buf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;NULL&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    		&lt;span class=&quot;k&quot;&gt;continue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    	&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

    	&lt;span class=&quot;n&quot;&gt;getTypeOutputInfo&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TupleDescAttr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tupdesc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;atttypid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    					  &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;typoutput&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;typisvarlena&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

    	&lt;span class=&quot;n&quot;&gt;value&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;OidOutputFunctionCall&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;typoutput&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;values&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]);&lt;/span&gt;
    	&lt;span class=&quot;n&quot;&gt;value&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;quote_literal_cstr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

    	&lt;span class=&quot;n&quot;&gt;appendStringInfo&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;buf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;%s&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

    	&lt;span class=&quot;n&quot;&gt;pfree&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;appendStringInfoString&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;buf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;);&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Once done, we just need to properly close the relation and return the generated
query to the caller:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;	&lt;span class=&quot;n&quot;&gt;table_close&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;relation&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;NoLock&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

	&lt;span class=&quot;n&quot;&gt;PG_RETURN_TEXT_P&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cstring_to_text&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;buf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And that’s all you need for the basic scenario!  The real implementation has a
bit more code for various other cases, like &lt;strong&gt;very basic&lt;/strong&gt; TOAST table
support,  but is still unlikely to correctly handle any weird corner cases that
can happen in the wild.&lt;/p&gt;

&lt;h3 id=&quot;basic-usage&quot;&gt;Basic usage&lt;/h3&gt;

&lt;p&gt;We can finally see the result of all the hard work in this article and the
previous one!  I will be using a simple scenario, first saving the current
WAL position to only keep the records generated afterwards, then removing all
the data from the table (without changing its relfilenode) to make sure that we
don’t read anything from the table itself.&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;-- Get the current WAL location&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_current_wal_lsn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;pg_current_wal_lsn&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;--------------------&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;F&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;46349&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;E80&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;row&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;EXTENSION&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_decode_record&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;EXTENSION&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;decode_record&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;text&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;storage&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;external&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;decode_record&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'simple test'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;-- Force a full-page write&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CHECKPOINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CHECKPOINT&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;decode_record&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'full-page write'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;decode_record&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'a bit big '&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;||&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;string_agg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;random&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()::&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;text&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;' '&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;generate_series&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;decode_record&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'way bigger '&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;||&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;string_agg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;random&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()::&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;text&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;' '&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;generate_series&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;120&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;-- Check the heap table size and underlying TOAST table size&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;oid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;regclass&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;text&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_size_pretty&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pg_relation_size&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;oid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)),&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;reltoastrelid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;regclass&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;text&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_size_pretty&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pg_relation_size&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;reltoastrelid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_class&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;relname&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'decode_record'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;oid&lt;/span&gt;      &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_size_pretty&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;      &lt;span class=&quot;n&quot;&gt;reltoastrelid&lt;/span&gt;      &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_size_pretty&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;---------------+----------------+-------------------------+----------------&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;decode_record&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;8192&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bytes&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_toast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pg_toast_66731&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;8192&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bytes&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;row&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DELETE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;decode_record&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;DELETE&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;-- Make sure we remove all records and physically empty the tables&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;VACUUM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;decode_record&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;VACUUM&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;oid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;regclass&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;text&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_size_pretty&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pg_relation_size&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;oid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)),&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;reltoastrelid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;regclass&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;text&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_size_pretty&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pg_relation_size&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;reltoastrelid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_class&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;relname&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'decode_record'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;oid&lt;/span&gt;      &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_size_pretty&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;      &lt;span class=&quot;n&quot;&gt;reltoastrelid&lt;/span&gt;      &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_size_pretty&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;---------------+----------------+-------------------------+----------------&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;decode_record&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bytes&lt;/span&gt;        &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_toast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pg_toast_66737&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bytes&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;row&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Ok, we should have a few records generated in the WAL corresponding to data we
definitely lost in the table.  Let’s extract the INSERT records using the
custom &lt;em&gt;pg_waldump&lt;/em&gt; we created in the previous article:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ mkdir -p /tmp/pg_decode_record
$ pg_waldump --start &quot;F/46349E80&quot; --save-records /tmp/pg_decode_record
[...]
$ ls -l /tmp/pg_decode_record
0000000F-46367520.1663.16384.66743.0_main
0000000F-46367660.1663.16384.66743.0_main
0000000F-46367738.1663.16384.66743.0_main
0000000F-46367868.1663.16384.66746.0_main
0000000F-46368130.1663.16384.66746.0_main
0000000F-46368300.1663.16384.66743.0_main
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You might wonder why there are 6 records extracted while we only inserted 4
rows.  That’s because the last record was big enough to be TOASTed using 2
chunks, and as far as the WAL are concerned that’s 3 separate INSERTs in 2
different tables.  Let’s see that in detail using the extension to decode the
records (truncating the output as some rows are quite big):&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;substr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;v&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;95&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_decode_all_records&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;'/tmp/pg_decode_records'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;v&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
                                          &lt;span class=&quot;n&quot;&gt;substr&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-------------------------------------------------------------------------------------------&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;decode_record&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;'1'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'simple test'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;decode_record&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;'2'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'full-page write'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;decode_record&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;'3'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'a bit big 0.5356172842583808 0.3...'&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_toast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pg_toast_66810&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;'66815'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'0'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;E&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\\&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;x7761792062696767657220302e...'&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_toast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pg_toast_66810&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;'66815'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'1'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;E&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\\&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;x3337383137353120302e303439...'&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;decode_record&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;VALUES&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;'4'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;cm&quot;&gt;/* toast pointer 66815 */&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;(note: I slightly edited the output to make it smaller and have correct syntax
highlighting, the real extension will emit the real table name in a comment in
case of INSERT in a TOAST table)&lt;/p&gt;

&lt;p&gt;We see the first normal records properly decoded, whether they’re in a
full-page image or not.  The last record is indeed split into 3 different
INSERTs, 2 in the TOAST table and 1 in the heap table.&lt;/p&gt;

&lt;p&gt;As I mentioned earlier I only added &lt;strong&gt;very minimal&lt;/strong&gt; support for TOAST tables,
as I didn’t have any information about the customer tables and whether they
would hit that case or not, or how often.  The last insert isn’t a valid
statement as the 2nd value is missing, but we can manually extract the value
from the INSERT statements in the TOAST table and therefore fix the normal
INSERT.  For instance, using the first few bytes that we can see in the first
chunk:&lt;/p&gt;

&lt;div class=&quot;language-sql highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;encode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;E&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\\&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;x7761792062696767657220302e'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'escape'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RECORD&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;---------&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;encode&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;way&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bigger&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The data is there, it just needs a bit of manual processing to get it.&lt;/p&gt;

&lt;p&gt;To be totally fair, I also cheated a bit in that example by making sure that
the data will be TOASTed but not compressed, so it’s very easy to manually
retrieve the raw value from the extra INSERTs in the TOAST tables.  It wouldn’t
be very hard to have all of that working transparently, but I simply didn’t
have the need.  If you’re interested in that, I’d recommend looking at the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;detoast_attr()&lt;/code&gt; function in
&lt;a href=&quot;https://github.com/postgres/postgres/blob/master/src/backend/access/common/detoast.c&quot;&gt;src/backend/access/common/detoast.c&lt;/a&gt;
and all underlying code to see how you can manually decompress data.  You would
then only need to store the detoasted (and potentially decompressed) value
referenced by the toast’s chunk_id locally, and emit it in the query instead of
the currently emitted comment.&lt;/p&gt;

&lt;h3 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;I hope you enjoyed those two articles and learned a bit about the WAL
infrastructure and the way pages and tuples work internally.&lt;/p&gt;

&lt;p&gt;If you missed it in the article, &lt;a href=&quot;/assets/patch/pg_decode_record.tgz&quot;&gt;here is the link for the full
extension&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I want to emphasize again that all the code I showed here is only a quick proof
of concept that’s thought for one narrow use case, and it should be used
with care.  My goal here wasn’t to show state of the art code but rather show
one possible way to quickly come up with a plan to salvage data in case of
production incident.   If you’re unfortunately confronted to a
similar problem, or some major other accident I hope you will find some
valuable resources and a starting point to come up with your own dedicated
solution!&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://rjuju.github.io/postgresql/2023/12/20/extract-sql-from-wal-part2.html&quot;&gt;Extracting SQL from WAL? (part 2)&lt;/a&gt; was originally published by Julien Rouhaud at &lt;a href=&quot;https://rjuju.github.io&quot;&gt;rjuju's home&lt;/a&gt; on December 20, 2023.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[Extracting SQL from WAL? (part 1)]]></title>
  <link rel="alternate" type="text/html" href="https://rjuju.github.io/postgresql/2023/12/06/extract-sql-from-wal.html" />
  <id>https://rjuju.github.io/postgresql/2023/12/06/extract-sql-from-wal</id>
  <published>2023-12-06T03:04:10+00:00</published>
  <updated>2023-12-06T03:04:10+00:00</updated>
  <author>
    <name>Julien Rouhaud</name>
    <uri>https://rjuju.github.io</uri>
    
  </author>
  <content type="html">
    &lt;p&gt;Is it actually possible to extract SQL commands from WAL generated in “replica”
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;wal_level&lt;/code&gt;?&lt;/p&gt;

&lt;p&gt;The answer is usually no, the “logical” &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;wal_level&lt;/code&gt; exists for a reason after
all, and you shouldn’t expect some kind of miracle here.&lt;/p&gt;

&lt;p&gt;But in this series of articles you will see that if some conditions are met
you can still manage to extract some information, and how to do it.  This first
article focuses on the WAL records and how to extract the ones you want, while
the next one will show how to try to extract the information contained in those
records.&lt;/p&gt;

&lt;h3 id=&quot;some-context&quot;&gt;Some context&lt;/h3&gt;

&lt;p&gt;This article is based of some work I did a few months ago to help a customer
recover some data after an incident.  It’s not a perfect solution and mostly a
set of quick hacks I did to come up with something able to retrieve data in a
few hours of work only, but I hope sharing details about it and some
methodology can be helpful if you ever get in a similar situation.  You will
probably need to adapt it to your needs, with yet other hacks, but it should
give you a good start.  It can otherwise be of some interest if you want to
know a bit more about the WAL records internals and some associated
infrastructure.&lt;/p&gt;

&lt;h3 id=&quot;the-incident&quot;&gt;The incident&lt;/h3&gt;

&lt;p&gt;Due to a series of unfortunate events, one of their HA clusters ended in a
split-brain situation for a some time before being reinitialised, which
entirely removed one of the data directory.  After that, only the WALs that
were were generated on that instance were available, those being in “replica”
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;wal_level&lt;/code&gt;, and nothing else.&lt;/p&gt;

&lt;p&gt;One possibility to try recover the data would be to restore a physical backup,
if any, replay archived WALs until the last transaction before the removed node
is promoted (assuming those are still available) and then replay the WALs
generated on that newly promoted node.  Once there you still need to look at
each row of each table of each database and compare it to yet another instance
restore from the same backup to approximately the same time as this one.
That’s clearly not ideal as it will likely require many days or even weeks of
tedious hard work to do so, and will consume a lot of resources along the way.
Is there a way to do better?&lt;/p&gt;

&lt;p&gt;After a quick discussion, it turned out that there were a few elements that
made some recovery from the WALs themselves possible (more on why later):&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;One of the data directories was still available&lt;/li&gt;
  &lt;li&gt;The customer guaranteed that no DDL happened since the incident&lt;/li&gt;
  &lt;li&gt;Only INSERTs happened during the split-brain&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 id=&quot;wals--physical-replication&quot;&gt;WALs &amp;amp; Physical replication&lt;/h3&gt;

&lt;p&gt;As you probably know, postgres physical replication works by sending an exact
copy of the modified binary raw data to the various standby servers, in a
continuous stream of WAL records.  As a consequence, those records don’t really
know much about the database objects they reference, and nothing about the SQL
queries that generated them.  So what do they really contain?  Let’s see what’s
inside the WAL records generated for an INSERT into a normal heap relation.&lt;/p&gt;

&lt;h4 id=&quot;wal-records&quot;&gt;WAL records&lt;/h4&gt;

&lt;p&gt;First of all, you have to know that the WAL records are split into &lt;strong&gt;Resource
Managers&lt;/strong&gt; (declared in
&lt;a href=&quot;https://github.com/postgres/postgres/blob/master/src/include/access/rmgrlist.h&quot;&gt;src/include/access/rmgrlist.h&lt;/a&gt;),
each being responsible for a specific part of postgres (heap tables, indexes,
vauum…).  They’re identified by a numeric identifier and often referred to as
a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;rmid&lt;/code&gt;, for //resource manager identifier//.&lt;/p&gt;

&lt;p&gt;Each of those resource managers can handle various operations, which are
internally called &lt;strong&gt;opcodes&lt;/strong&gt;.  Here we’re interested in the WAL records
generated while operating on standard heap tables, and especially during
INSERTs.  This resource manager is a bit particular as it’s split into 2
different &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;rmid&lt;/code&gt;: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RM_HEAP_ID&lt;/code&gt; and R&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;M_HEAP2_ID&lt;/code&gt;.  This is only an
implementation details, as each resource manager can only handle a limited
number of opcodes, everything is the same otherwise.&lt;/p&gt;

&lt;p&gt;If you’re curious, here’s the definition of the main WAL record in the &lt;a href=&quot;https://github.com/postgres/postgres/blob/master/src/include/access/xlogrecord.h&quot;&gt;source
code&lt;/a&gt;
and a bit of details on the exact layout in the files:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;cm&quot;&gt;/*
 * The overall layout of an XLOG record is:
 *		Fixed-size header (XLogRecord struct)
 *		XLogRecordBlockHeader struct
 *		XLogRecordBlockHeader struct
 *		...
 *		XLogRecordDataHeader[Short|Long] struct
 *		block data
 *		block data
 *		...
 *		main data
 * [...]
 */&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;typedef&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;XLogRecord&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;uint32&lt;/span&gt;		&lt;span class=&quot;n&quot;&gt;xl_tot_len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;		&lt;span class=&quot;cm&quot;&gt;/* total len of entire record */&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;TransactionId&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;xl_xid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;		&lt;span class=&quot;cm&quot;&gt;/* xact id */&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;XLogRecPtr&lt;/span&gt;	&lt;span class=&quot;n&quot;&gt;xl_prev&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;		&lt;span class=&quot;cm&quot;&gt;/* ptr to previous record in log */&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;uint8&lt;/span&gt;		&lt;span class=&quot;n&quot;&gt;xl_info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;		&lt;span class=&quot;cm&quot;&gt;/* flag bits, see below */&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;RmgrId&lt;/span&gt;		&lt;span class=&quot;n&quot;&gt;xl_rmid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;		&lt;span class=&quot;cm&quot;&gt;/* resource manager for this record */&lt;/span&gt;
	&lt;span class=&quot;cm&quot;&gt;/* 2 bytes of padding here, initialize to zero */&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;pg_crc32c&lt;/span&gt;	&lt;span class=&quot;n&quot;&gt;xl_crc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;			&lt;span class=&quot;cm&quot;&gt;/* CRC for this record */&lt;/span&gt;

	&lt;span class=&quot;cm&quot;&gt;/* XLogRecordBlockHeaders and XLogRecordDataHeader follow, no padding */&lt;/span&gt;

&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;XLogRecord&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;and a block data header:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;
&lt;span class=&quot;cm&quot;&gt;/*
 * Header info for block data appended to an XLOG record.
 *
 * 'data_length' is the length of the rmgr-specific payload data associated
 * with this block. It does not include the possible full page image, nor
 * XLogRecordBlockHeader struct itself.
 *
 * Note that we don't attempt to align the XLogRecordBlockHeader struct!
 * So, the struct must be copied to aligned local storage before use.
 */&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;typedef&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;XLogRecordBlockHeader&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;uint8&lt;/span&gt;		&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;				&lt;span class=&quot;cm&quot;&gt;/* block reference ID */&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;uint8&lt;/span&gt;		&lt;span class=&quot;n&quot;&gt;fork_flags&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;		&lt;span class=&quot;cm&quot;&gt;/* fork within the relation, and flags */&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;uint16&lt;/span&gt;		&lt;span class=&quot;n&quot;&gt;data_length&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;	&lt;span class=&quot;cm&quot;&gt;/* number of payload bytes (not including page
								 * image) */&lt;/span&gt;

	&lt;span class=&quot;cm&quot;&gt;/* If BKPBLOCK_HAS_IMAGE, an XLogRecordBlockImageHeader struct follows */&lt;/span&gt;
	&lt;span class=&quot;cm&quot;&gt;/* If BKPBLOCK_SAME_REL is not set, a RelFileLocator follows */&lt;/span&gt;
	&lt;span class=&quot;cm&quot;&gt;/* BlockNumber follows */&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;XLogRecordBlockHeader&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Everything here is very generic as it’s used by all the resource managers.  One
important bit though is the mention of a &lt;strong&gt;RelFileLocator&lt;/strong&gt; after the block
header if the record contains information about a different relation from the
previous block, whatever is was (which is the meaning of BKPBLOCK_SAME_REL).
This is of course important information for us.&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;typedef&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RelFileLocator&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;Oid&lt;/span&gt;			&lt;span class=&quot;n&quot;&gt;spcOid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;			&lt;span class=&quot;cm&quot;&gt;/* tablespace */&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;Oid&lt;/span&gt;			&lt;span class=&quot;n&quot;&gt;dbOid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;			&lt;span class=&quot;cm&quot;&gt;/* database */&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;RelFileNumber&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;relNumber&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;	&lt;span class=&quot;cm&quot;&gt;/* relation */&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RelFileLocator&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;But here’s a first reason why you need a proper data directory to do anything
with the WALs: this doesn’t contain the schema name and table name, or even the
table oid, but the &lt;strong&gt;tablespace oid, database oid and relfilenode&lt;/strong&gt;, which is
what the WAL actually need to identify a physical relation file (which is
itself split into multiple files, the exact
&lt;a href=&quot;https://github.com/postgres/postgres/blob/master/src/backend/storage/smgr/README&quot;&gt;fork&lt;/a&gt;
and segment are deduced using other information).  So any table rewrite
happening since the WAL records were generated (e.g. a VACUUM FULL) and you
won’t be able to identify which relation a record is about, unless of course
you find a way to map the current relfilenode to the one before the table
rewrite.&lt;/p&gt;

&lt;h4 id=&quot;heap-insert-wal-records&quot;&gt;Heap INSERT WAL records&lt;/h4&gt;

&lt;p&gt;Now that we saw a bit of the general WAL structures, let’s focus on the data
specific to an INSERT.  If you’re not familiar really with the internals, one
easy way to locate the code related to a specific command is to look at the
functions associated to a resource manager.  Let’s look at the &lt;strong&gt;RM_HEAP_ID&lt;/strong&gt;
information in
&lt;a href=&quot;https://github.com/postgres/postgres/blob/master/src/include/access/rmgrlist.h&quot;&gt;src/include/access/rmgrlist.h&lt;/a&gt;:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;cm&quot;&gt;/* symbol name, textual name, redo, desc, identify, startup, cleanup, mask, decode */&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;PG_RMGR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;RM_HEAP_ID&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;Heap&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;heap_redo&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;heap_desc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;heap_identify&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;heap_mask&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;heap_decode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We here have the name of the actual functions responsible for many operations
(the exact list will vary depending on the postgres major version, I’m here
using the list in postgres 17).&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;redo&lt;/strong&gt; function is the name of the function that applies an RM_HEAP_ID
record, the &lt;strong&gt;desc&lt;/strong&gt; functions is the one that emits the info you see in
pg_waldump, the &lt;strong&gt;identify&lt;/strong&gt; function returns a string describing the opcode
and so on.  Let’s look at &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;heap_identify()&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;heap_identify&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;uint8&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

	&lt;span class=&quot;k&quot;&gt;switch&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;~&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;XLR_INFO_MASK&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
		&lt;span class=&quot;k&quot;&gt;case&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;XLOG_HEAP_INSERT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
			&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;INSERT&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
			&lt;span class=&quot;k&quot;&gt;break&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;[...]&lt;/span&gt;
	&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

	&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We now know that the opcode we’re interested in is &lt;strong&gt;XLOG_HEAP_INSERT&lt;/strong&gt;.  A
quick &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;git grep&lt;/code&gt; in the tree will lead you to
&lt;a href=&quot;https://github.com/postgres/postgres/blob/master/src/backend/access/heap/heapam.c&quot;&gt;src/backend/access/heap/heapam.c&lt;/a&gt;,
more precisely the &lt;strong&gt;heap_insert&lt;/strong&gt; function.  The interesting bit is located in
the “XLOG stuff” block.  I will show here an extract focusing on the bit we
will need:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;heap_insert&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Relation&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;relation&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;HeapTuple&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tup&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CommandId&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
			&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BulkInsertState&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bistate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;[...]&lt;/span&gt;
	&lt;span class=&quot;cm&quot;&gt;/* XLOG stuff */&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;RelationNeedsWAL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;relation&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
	&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;xl_heap_insert&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;xlrec&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;xl_heap_header&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;xlhdr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;XLogRecPtr&lt;/span&gt;	&lt;span class=&quot;n&quot;&gt;recptr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;Page&lt;/span&gt;		&lt;span class=&quot;n&quot;&gt;page&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BufferGetPage&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;buffer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;uint8&lt;/span&gt;		&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;XLOG_HEAP_INSERT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
		&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt;			&lt;span class=&quot;n&quot;&gt;bufflags&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;[...]&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;xlrec&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;offnum&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ItemPointerGetOffsetNumber&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;heaptup&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t_self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;xlrec&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;flags&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;[...]&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;XLogBeginInsert&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;XLogRegisterData&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;xlrec&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SizeOfHeapInsert&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

		&lt;span class=&quot;n&quot;&gt;xlhdr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t_infomask2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;heaptup&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t_data&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t_infomask2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;xlhdr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t_infomask&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;heaptup&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t_data&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t_infomask&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;xlhdr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t_hoff&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;heaptup&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t_data&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t_hoff&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

		&lt;span class=&quot;cm&quot;&gt;/*
		 * note we mark xlhdr as belonging to buffer; if XLogInsert decides to
		 * write the whole page to the xlog, we don't need to store
		 * xl_heap_header in the xlog.
		 */&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;XLogRegisterBuffer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;buffer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;REGBUF_STANDARD&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bufflags&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;XLogRegisterBufData&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;xlhdr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SizeOfHeapHeader&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
		&lt;span class=&quot;cm&quot;&gt;/* PG73FORMAT: write bitmap [+ padding] [+ oid] + data */&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;XLogRegisterBufData&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
							&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;heaptup&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t_data&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SizeofHeapTupleHeader&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
							&lt;span class=&quot;n&quot;&gt;heaptup&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;t_len&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SizeofHeapTupleHeader&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;[...]&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;recptr&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;XLogInsert&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;RM_HEAP_ID&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

		&lt;span class=&quot;n&quot;&gt;PageSetLSN&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;page&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;recptr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
	&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We see here that this function is as expected inserting an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RM_HEAP_ID&lt;/code&gt; record,
with an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;XLOG_HEAP_INSERT&lt;/code&gt; opcode.  There are 2 data parts associated with this
record: the header of the tuple that’s being inserted and the tuple itself.&lt;/p&gt;

&lt;p&gt;That’s great!  At this point we know how to identify what relation an INSERT is
about and the content of that INSERT.  Let’s see how to filter those records
from the WALs.&lt;/p&gt;

&lt;h3 id=&quot;extracting-and-filtering-wal-records&quot;&gt;Extracting and filtering WAL records&lt;/h3&gt;

&lt;p&gt;Parsing the postgres WALs isn’t that complicated but still requires to know
quite a bit more than what I showed here.  Writing such code is possible but
wait, don’t we already have a tool shipped with postgres which is designed
to do exactly that?  Yes there sure is, it’s
&lt;a href=&quot;https://github.com/postgres/postgres/tree/master/src/bin/pg_waldump&quot;&gt;pg_waldump&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Rather that writing something similar, couldn’t we simply teach pg_waldump to
filter the records we’re interested in and save them somewhere so that we can
later process them and generate SQL queries?  This way we can then also benefit
from all options in pg_waldump like specifying the starting and/or ending LSN
or filtering a specific resource manager, without the need to worry about most
of the WAL implementation details and only focusing on the few functions
provided by postgres necessary for our need.  Let’s see how to implement that.&lt;/p&gt;

&lt;p&gt;The main source file is
&lt;a href=&quot;https://github.com/postgres/postgres/blob/master/src/bin/pg_waldump/pg_waldump.c&quot;&gt;src/bin/pg_waldump/pg_waldump.c&lt;/a&gt;.
Skipping most of the unrelated code, we can see that there’s a main loop that
takes care of reading each record one by one, optionally filter them and then
do something with them depending on how the tool was executed.  I will again
show an extract to focus on the most relevant part only:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;	&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(;;)&lt;/span&gt;
	&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;[...]&lt;/span&gt;
		&lt;span class=&quot;cm&quot;&gt;/* try to read the next record */&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;record&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;XLogReadRecord&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;xlogreader_state&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;errormsg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;[...]&lt;/span&gt;
		&lt;span class=&quot;cm&quot;&gt;/* apply all specified filters */&lt;/span&gt;
		&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filter_by_rmgr_enabled&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt;
			&lt;span class=&quot;o&quot;&gt;!&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filter_by_rmgr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;record&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;xl_rmid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;
			&lt;span class=&quot;k&quot;&gt;continue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;p&quot;&gt;[...]&lt;/span&gt;

		&lt;span class=&quot;cm&quot;&gt;/* perform any per-record work */&lt;/span&gt;
		&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;!&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;quiet&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
		&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
			&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;stats&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
			&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
				&lt;span class=&quot;n&quot;&gt;XLogRecStoreStats&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;stats&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;xlogreader_state&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
				&lt;span class=&quot;n&quot;&gt;stats&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;endptr&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;xlogreader_state&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;EndRecPtr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
			&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
			&lt;span class=&quot;k&quot;&gt;else&lt;/span&gt;
				&lt;span class=&quot;n&quot;&gt;XLogDumpDisplayRecord&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;xlogreader_state&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
		&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

		&lt;span class=&quot;cm&quot;&gt;/* save full pages if requested */&lt;/span&gt;
		&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;save_fullpage_path&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
			&lt;span class=&quot;n&quot;&gt;XLogRecordSaveFPWs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;xlogreader_state&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;save_fullpage_path&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

		&lt;span class=&quot;cm&quot;&gt;/* check whether we printed enough */&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;already_displayed_records&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
		&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;stop_after_records&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt;
			&lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;already_displayed_records&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;stop_after_records&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
			&lt;span class=&quot;k&quot;&gt;break&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
	&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;That’s quite simple, pg_waldump read the records one by one until it needs to
stop, ignore the records that the users asked to discard and then takes action
on the remaining ones.  We can see that there’s already an option to save full
page images, it definitely looks like we could just add something similar
there, but for all records.&lt;/p&gt;

&lt;p&gt;First, we will need to provide a way to identify the relation the INSERT is
about.  That’s the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RelFileLocator&lt;/code&gt;, and we already know that it can be found
just after the XLogRecordBlockHeader.  Postgres provides a function to retrieve
this information, and a bit more, named
&lt;a href=&quot;https://github.com/postgres/postgres/blob/master/src/backend/access/transam/xlogreader.c&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;XLogRecGetBlockTagExtended()&lt;/code&gt;&lt;/a&gt;.
Here is it’s description:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;cm&quot;&gt;/*
 * Returns information about the block that a block reference refers to,
 * optionally including the buffer that the block may already be in.
 *
 * If the WAL record contains a block reference with the given ID, *rlocator,
 * *forknum, *blknum and *prefetch_buffer are filled in (if not NULL), and
 * returns true.  Otherwise returns false.
 */&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;bool&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;XLogRecGetBlockTagExtended&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;XLogReaderState&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;record&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;uint8&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;block_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
						   &lt;span class=&quot;n&quot;&gt;RelFileLocator&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rlocator&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
 						   &lt;span class=&quot;n&quot;&gt;ForkNumber&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;forknum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
						   &lt;span class=&quot;n&quot;&gt;BlockNumber&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;blknum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
						   &lt;span class=&quot;n&quot;&gt;Buffer&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;prefetch_buffer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We need to provide the record - pg_waldump already retrieves it for us - and
the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;block_id&lt;/code&gt;.  The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;block_id&lt;/code&gt;, or block reference, is simply an offset in the
array of data that the WAL records contains.  If you look a bit above in this
article, you will see that we already know that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;heap_insert()&lt;/code&gt; only uses a
hardcoded &lt;strong&gt;0&lt;/strong&gt; block_id: this is the first argument in the various
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;XLogRegisterXXX()&lt;/code&gt; function calls.&lt;/p&gt;

&lt;p&gt;Next we need to retrieve the actual WAL record data, the tuple header and the
tuple itself.  This one is a bit trickier, as the record can either be found in
a simple WAL record or in a full-page record.  We need to check for a simple
WAL record first.  The associated function is
&lt;a href=&quot;https://github.com/postgres/postgres/blob/master/src/backend/access/transam/xlogreader.c&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;XLogRecGetBlockData()&lt;/code&gt;&lt;/a&gt;:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;cm&quot;&gt;/*
 * Returns the data associated with a block reference, or NULL if there is
 * no data (e.g. because a full-page image was taken instead). The returned
 * pointer points to a MAXALIGNed buffer.
 */&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;XLogRecGetBlockData&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;XLogReaderState&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;record&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;uint8&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;block_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Size&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;As noted in the comment, if the function returns NULL (and sets len to &lt;strong&gt;0&lt;/strong&gt;)
then the data may be in a full-page image instead (or the data could be missing
entirely).  If that’s the case we need to retrieve the full-page image, and
then locate the tuple the INSERT was about and extract it in the same format as
a simple WAL record.&lt;/p&gt;

&lt;p&gt;Postgres provides a function to extract the full-page image:
&lt;a href=&quot;https://github.com/postgres/postgres/blob/master/src/backend/access/transam/xlogreader.c&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RestoreBlockImage()&lt;/code&gt;&lt;/a&gt;:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;cm&quot;&gt;/*
 * Restore a full-page image from a backup block attached to an XLOG record.
 *
 * Returns true if a full-page image is restored, and false on failure with
 * an error to be consumed by the caller.
 */&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;bool&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;RestoreBlockImage&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;XLogReaderState&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;record&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;uint8&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;block_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;page&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;which is straightforward to use: just provide the record and the block
identifier and you get the full-page image if found.  However, there’s no
function available to extract a tuple for a full-page image.  Indeed postgres
can simply overwrite the whole block with the full-page image as it contains
the latest version of the block at the time it was generated, but in our case
we definitely don’t want to emit an INSERT statement for every already existing
tuple in the block!&lt;/p&gt;

&lt;p&gt;Fortunately, even when we get a full-page image, our record still contains a
//main data area//.  If you look up at the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;heap_insert()&lt;/code&gt; function, that’s
the call to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;XLogRegisterData()&lt;/code&gt;, and as you see here it contains an
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;xl_heap_insert&lt;/code&gt; struct.  And the first member of this struct, &lt;strong&gt;offnum&lt;/strong&gt;, is
actually the position of the tuple in the page which is exactly what we need!&lt;/p&gt;

&lt;p&gt;With all of that, it’s just a matter of accessing the tuple header and tuple at
the correct place among all the tuples present in the page, and save as we
would way it would be if it were a simple WAL record.  If you’re wondering how
exactly it should be done, you can always look at how postgres itself does it
when it needs to return a specific tuple and adapt that code to your need.  The
functions responsible for that are &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;heapgetpage()&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;heapgettup()&lt;/code&gt;, located
in the
&lt;a href=&quot;https://github.com/postgres/postgres/blob/master/src/backend/access/heap/heapam.c&quot;&gt;src/backend/access/heap/heapam.c&lt;/a&gt;
file we already mentioned.&lt;/p&gt;

&lt;p&gt;We now have the information about the physical file location and the record
itself that we will need to transmit to another program to decode it.  The best
way to do that is to simply save the record as-is in a binary file, and use the
file name to transmit the metadata.  I chose the following pattern to name the
produced files:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;LSN.TABLESPACE_OID.DATABASE_OID.RELFILENODE.FORKNAME
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It will be trivial for the consumer to parse it and extract the required
metadata.  One thing to note is that I don’t put the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;rmid&lt;/code&gt; or the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;opcode&lt;/code&gt;
here as I’m only emitting the only one I’m interested in and discard everything
else.  If that’s not your case you should definitely remember to add those in
the filename pattern.&lt;/p&gt;

&lt;p&gt;Since this requires a bit of code to implement, I won’t detail it here but you
can find the full result in the patch for pg_waldump that I’m attaching to
this article, which implements this as a new &lt;strong&gt;–save-records&lt;/strong&gt; option.&lt;/p&gt;

&lt;p&gt;To conclude, let me also remind you that a compiled version of pg_waldump will
only work for a single major postgres version.  In my case, I had to work with
postgres 11, so you can &lt;a href=&quot;/assets/patch/0001-Add-a-save-records-PATH-option-to-pg_waldump_pg11.patch&quot;&gt;find the patch for this version
here&lt;/a&gt;,
but if needed I also rebased it again the current commit on the master branch,
which &lt;a href=&quot;/assets/patch/0001-Add-a-save-records-PATH-option-to-pg_waldump_pg17.patch&quot;&gt;can be found
here&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;whats-next&quot;&gt;What’s next?&lt;/h3&gt;

&lt;p&gt;This is the end of this first article.  We saw some details on the postgres WAL
infrastructure, with a full example for the case of a plain INSERT on a heap
table.  We also learned where to look to find where other WAL records are
generated and to see more details about the implementation.&lt;/p&gt;

&lt;p&gt;We also checked how pg_waldump is working and how to adapt it for our need,
with a provided complete patch for both &lt;a href=&quot;/assets/patch/0001-Add-a-save-records-PATH-option-to-pg_waldump_pg11.patch&quot;&gt;postgres
11&lt;/a&gt;
and &lt;a href=&quot;/assets/patch/0001-Add-a-save-records-PATH-option-to-pg_waldump_pg17.patch&quot;&gt;the current dev version (postgres
17)&lt;/a&gt;.
Again, I’d like to remind you that all this work is only at a proof-of-concept
stage, it’s definitely not polished and I’m sure that are many problems that
would need to be fixed.  One obvious example of such problem is that we’re
saving all INSERT we find in the logs but we don’t check if the transaction
they’re in eventually committed.  It would be possible to fix that but it would
require extraneous code, so as is it’s up to the users to double check that as
needed.  Overall it was enough to recover the needed data so I didn’t pursue
any more work on it.&lt;/p&gt;

&lt;p&gt;In the next article we will see some usage of this new &lt;strong&gt;–save-records&lt;/strong&gt;
option, and also how to read those records and decode them to generate plain
INSERT queries.  Stay tuned!&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://rjuju.github.io/postgresql/2023/12/06/extract-sql-from-wal.html&quot;&gt;Extracting SQL from WAL? (part 1)&lt;/a&gt; was originally published by Julien Rouhaud at &lt;a href=&quot;https://rjuju.github.io&quot;&gt;rjuju's home&lt;/a&gt; on December 06, 2023.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[Queryid reporting in plpgsql_check]]></title>
  <link rel="alternate" type="text/html" href="https://rjuju.github.io/postgresql/2020/11/17/queryid-reporting-in-plpgsql_check.html" />
  <id>https://rjuju.github.io/postgresql/2020/11/17/queryid-reporting-in-plpgsql_check</id>
  <published>2020-11-17T02:42:33+00:00</published>
  <updated>2020-11-17T02:42:33+00:00</updated>
  <author>
    <name>Julien Rouhaud</name>
    <uri>https://rjuju.github.io</uri>
    
  </author>
  <content type="html">
    &lt;p&gt;plpgsql_check version 1.14.0 was just released and brings some improvement for
performance diagnostic.&lt;/p&gt;

&lt;p&gt;Thanks &lt;strong&gt;a lot&lt;/strong&gt; to &lt;a href=&quot;http://okbob.blogspot.com/&quot;&gt;Pavel Stěhule&lt;/a&gt; for the awesome
plpgsql_check extension and the help for implementing the queryid reporting in
v1.14!&lt;/p&gt;

&lt;h3 id=&quot;plpgsql_check-static-code-analysis-and-more&quot;&gt;plpgsql_check: static code analysis and more&lt;/h3&gt;

&lt;p&gt;PostgreSQL supports procedural code for many languages, the most popular one
probably being plpgsql.&lt;/p&gt;

&lt;p&gt;Even if that language allows you to write raw SQL statements, any function
written in that language is still a block box as far as PostgreSQL is
concerned, which means that PostgreSQL won’t perform a lot of checks to verify
code quality, typo or any other problem related to code development.  That’s
where &lt;a href=&quot;https://github.com/okbob/plpgsql_check&quot;&gt;plpgsql_check extension&lt;/a&gt; comes
into play.&lt;/p&gt;

&lt;p&gt;If you write any plpgsql code, this extension will be your best friend, as it
brings so many cool features.  The major feature is static code analysis, which
can detect many bugs, security / SQL inject issue and even possible performance
issue by detecting implicit casts that could prevent PostgreSQL from using
indexes and much more.&lt;/p&gt;

&lt;p&gt;It also brings a simple, but yet very useful, &lt;strong&gt;code profiler&lt;/strong&gt;.&lt;/p&gt;

&lt;h3 id=&quot;how-to-track-down-performance-issue-in-plpgsql-code&quot;&gt;How to track down performance issue in plpgsql code?&lt;/h3&gt;

&lt;p&gt;As I mentioned above, plpgsql code is a black box as far as PostgreSQL is
concerned.  The direct consequence is that the performance diagnostic
possibilities are quite limited.&lt;/p&gt;

&lt;p&gt;Using core PostgreSQL, the only option is using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pg_stat_user_functions&lt;/code&gt; (which
requires &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;track_functions&lt;/code&gt; to be set to &lt;strong&gt;pl&lt;/strong&gt; or &lt;strong&gt;all&lt;/strong&gt;).  It’ll show the
number of time each function has been called, and how long the execution took
including and excluding nested functions.  Unfortunately, this view can only
help you track down &lt;strong&gt;which&lt;/strong&gt; function is slow, but not &lt;strong&gt;why&lt;/strong&gt;, as you
don’t get any per-instruction metric.&lt;/p&gt;

&lt;p&gt;You can somehow work around that limitation using the contrib extensions
&lt;a href=&quot;https://www.postgresql.org/docs/current/pgstatstatements.html&quot;&gt;pg_stat_statements&lt;/a&gt;.
This extensions is one of the most popular one as far as performance diagnostic
is concerned, and gives you a lot of data on query performance (including
&lt;a href=&quot;/postgresql/2020/04/04/new-in-pg13-monitoring-query-planner.html&quot;&gt;planning counters&lt;/a&gt; and &lt;a href=&quot;/postgresql/2020/04/07/new-in-pg13-WAL-monitoring.html&quot;&gt;WAL counters&lt;/a&gt; since PostgreSQL 13).&lt;/p&gt;

&lt;p&gt;The only problem is that it can be quite tricky to match pg_stat_statements
entries with your plpgsql code, as there’s way to directly identify which
queries are run inside your plpgsql code.&lt;/p&gt;

&lt;h3 id=&quot;plpgsql_check-code-profiler&quot;&gt;plpgsql_check code profiler&lt;/h3&gt;

&lt;p&gt;Another alternative is to use a plpgsql code profiler.  There are multiple
extensions that bring this feature, and I personally chose
&lt;a href=&quot;https://github.com/okbob/plpgsql_check&quot;&gt;plpgsql_check&lt;/a&gt;, as it perfectly suited
my need: simple to setup and use, all performance information I needed and
possibility to use it either in a per-connection base or globally when
configuration the extension in &lt;strong&gt;shared_preload_libraries&lt;/strong&gt;.  Thanks to this
profiler, you can finally get performance metrics at the statement level
&lt;strong&gt;inside plpgsql code&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;total execution time, that is the cumulated execution time for all the
statements in the source code line&lt;/li&gt;
  &lt;li&gt;average execution time, that is the total execution time divided by the
number of statements in the source code line&lt;/li&gt;
  &lt;li&gt;maximum execution time, per statement&lt;/li&gt;
  &lt;li&gt;number of rows processed, per statement&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With those information, it becomes quite easy to track down the slow part of
your functions.  Here’s a simplistic example:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lineno&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cmds_on_row&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;total_time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;avg_time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;max_time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;source&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;plpgsql_profiler_function_tb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;'pltest()'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;lineno&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cmds_on_row&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;total_time&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;avg_time&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;     &lt;span class=&quot;n&quot;&gt;max_time&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;                        &lt;span class=&quot;k&quot;&gt;source&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;--------+-------------+------------+----------+------------------+-------------------------------------------------------&lt;/span&gt;
      &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;      &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;   &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
      &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;      &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;   &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DECLARE&lt;/span&gt;
      &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;      &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;   &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;     &lt;span class=&quot;n&quot;&gt;num&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;bigint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
      &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;      &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;   &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;     &lt;span class=&quot;n&quot;&gt;_tbl&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;text&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'pg_class'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
      &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;      &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;085&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;    &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;085&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;085&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;          &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BEGIN&lt;/span&gt;
      &lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;      &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;504&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;    &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;504&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;504&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;          &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;     &lt;span class=&quot;k&quot;&gt;drop&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;table&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;exists&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;meh&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
      &lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;       &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;81&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;     &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;81&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;81&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;     &lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;meh&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
      &lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;      &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;362&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;    &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;362&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;362&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;          &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;     &lt;span class=&quot;k&quot;&gt;EXECUTE&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'SELECT COUNT(*) FROM '&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;||&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_tbl&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;num&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
      &lt;span class=&quot;mi&quot;&gt;9&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;    &lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;84&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;   &lt;span class=&quot;mi&quot;&gt;500&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;42&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;349&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;491&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;     &lt;span class=&quot;k&quot;&gt;delete&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;meh&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PERFORM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_sleep&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
     &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;          &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;        &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;              &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;     &lt;span class=&quot;k&quot;&gt;RETURN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;num&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
     &lt;span class=&quot;mi&quot;&gt;11&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;      &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;   &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;11&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;In this example, we can see immediately that the slowdown comes from source
code line n°9, which has a total execution time of 1s.  Using the &lt;strong&gt;max_time&lt;/strong&gt;
field, we see that it’s because of the 2nd statements.  As we also have the
source code available in the view, we can immediately see the problematic
query, which here is a simple call to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pg_sleep(1)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;So far so good.  But with less naive example the cause of slow execution might
be less obvious, and it could be handy to rely on all the available extensions
to get more information:
&lt;a href=&quot;https://www.postgresql.org/docs/current/pgstatstatements.html&quot;&gt;pg_stat_statements&lt;/a&gt;
for general counters,
&lt;a href=&quot;https://github.com/powa-team/pg_stat_kcache&quot;&gt;pg_stat_kcache&lt;/a&gt; for CPU and disk
usage counters,
&lt;a href=&quot;https://github.com/postgrespro/pg_wait_sampling&quot;&gt;pg_wait_sampling&lt;/a&gt; for wait
events and so on.&lt;/p&gt;

&lt;p&gt;But how to match the plpgsql statement with entries in those extensions?&lt;/p&gt;

&lt;h3 id=&quot;exposing-queryid-in-plpgql_check-profiler&quot;&gt;Exposing queryid in plpgql_check profiler&lt;/h3&gt;

&lt;p&gt;Indeed, those extensions identify queries using a &lt;strong&gt;query identifier&lt;/strong&gt;,
computed by &lt;strong&gt;pg_stat_statements&lt;/strong&gt;.  You could try to manually find the related
entry using the query text stored by &lt;strong&gt;pg_stat_statements&lt;/strong&gt;, but it may not
always be possible.  What if the query is dynamic SQL or using unqualified
names?&lt;/p&gt;

&lt;p&gt;The solution here is quite simple: since plpgsql_check profiler already show
per-statement information, also report the statement’s underlying queryid.&lt;/p&gt;

&lt;p&gt;This is now available with version 1.14.0.  Using the previous naive example,
here’s what we now see:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lineno&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;max_time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;queryids&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;source&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;plpgsql_profiler_function_tb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;'pltest()'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;lineno&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;     &lt;span class=&quot;n&quot;&gt;max_time&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;                 &lt;span class=&quot;n&quot;&gt;queryids&lt;/span&gt;                  &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;                        &lt;span class=&quot;k&quot;&gt;source&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;--------+------------------+-------------------------------------------+-------------------------------------------------------&lt;/span&gt;
      &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;                                    &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
      &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;                                    &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DECLARE&lt;/span&gt;
      &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;                                    &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;     &lt;span class=&quot;n&quot;&gt;num&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;bigint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
      &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;                                    &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;     &lt;span class=&quot;n&quot;&gt;_tbl&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;text&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'pg_class'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
      &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;085&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;          &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;                                    &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BEGIN&lt;/span&gt;
      &lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;504&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;          &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;                                    &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;     &lt;span class=&quot;k&quot;&gt;drop&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;table&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;exists&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;meh&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
      &lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;81&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;                                    &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;     &lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;meh&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
      &lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;362&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;          &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;7484655548452190292&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;                    &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;     &lt;span class=&quot;k&quot;&gt;EXECUTE&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'SELECT COUNT(*) FROM '&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;||&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_tbl&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;num&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
      &lt;span class=&quot;mi&quot;&gt;9&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;349&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;491&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;8162364748417812595&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6729783856403017864&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;     &lt;span class=&quot;k&quot;&gt;delete&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;meh&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PERFORM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_sleep&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
     &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;              &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;                                    &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;     &lt;span class=&quot;k&quot;&gt;RETURN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;num&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
     &lt;span class=&quot;mi&quot;&gt;11&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;                                    &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;11&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;You’re now only a JOIN away from matching your plpgsql profile data from your
favorite extensions!&lt;/p&gt;

&lt;h3 id=&quot;limitations&quot;&gt;Limitations&lt;/h3&gt;

&lt;p&gt;There are unfortunately some limitations.&lt;/p&gt;

&lt;p&gt;Due to pg_stat_statements implementation, queryid for DDL queries is not
exposed outside the extension, so plpgsql_check can’t retrieve it.&lt;/p&gt;

&lt;p&gt;When using dynamic SQL, there might be &lt;strong&gt;many&lt;/strong&gt; queries involved:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;the query text itself will be generated using SQL statement(s)&lt;/li&gt;
  &lt;li&gt;the parameters, if any, will also be resolved running SQL statement(s)&lt;/li&gt;
  &lt;li&gt;if the query text depends on some parameters, you can end up with multiple
different top level query&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;plpgsql_check will only report the top level query identifier, and if multiple
different queries are generated only the query identifier of the first one will
be reported.&lt;/p&gt;

&lt;p&gt;Even with those limitations I still hope that this new feature will be helpful.&lt;/p&gt;

&lt;h3 id=&quot;whats-next&quot;&gt;What’s next?&lt;/h3&gt;

&lt;p&gt;Due to current plpgsql implementation, when a dynamic SQL statement is executed
the query identifier is not visible outside plpgsql itself.  It means that
retrieving the query identifier in that case is a bit costly, as plpgsql_check
has to do some of the work that plpgsql is doing:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;generate the final query string&lt;/li&gt;
  &lt;li&gt;parse the query string&lt;/li&gt;
  &lt;li&gt;call the parse analysis step (this is where the query identifier is
generated)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Of course the query itself won’t be executed or even planned, but those extra
steps might add non negligible overhead, especially when the dynamic SQL is
executing very short OLTP-style queries.&lt;/p&gt;

&lt;p&gt;So plpgsql should be modified to be able to report the query identifier of all
statements, whether static or dynamic, so external modules can access the
information easily and without any additional overhead.  Ideally, this could
also be available in plpgsql code using a &lt;strong&gt;GET [ CURRENT ] DIAGNOSTICS&lt;/strong&gt;
command, so users can also use it as they need.&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://rjuju.github.io/postgresql/2020/11/17/queryid-reporting-in-plpgsql_check.html&quot;&gt;Queryid reporting in plpgsql_check&lt;/a&gt; was originally published by Julien Rouhaud at &lt;a href=&quot;https://rjuju.github.io&quot;&gt;rjuju's home&lt;/a&gt; on November 17, 2020.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[New in pg13: WAL monitoring]]></title>
  <link rel="alternate" type="text/html" href="https://rjuju.github.io/postgresql/2020/04/07/new-in-pg13-WAL-monitoring.html" />
  <id>https://rjuju.github.io/postgresql/2020/04/07/new-in-pg13-WAL-monitoring</id>
  <published>2020-04-07T15:46:15+00:00</published>
  <updated>2020-04-07T15:46:15+00:00</updated>
  <author>
    <name>Julien Rouhaud</name>
    <uri>https://rjuju.github.io</uri>
    
  </author>
  <content type="html">
    &lt;p&gt;Write-Ahead Logs is a critical part of PostgreSQL, that ensures data
durability.  While there are multiple &lt;a href=&quot;https://www.postgresql.org/docs/current/runtime-config-wal.html&quot;&gt;configuration parameters
&lt;/a&gt;, there was
no easy to monitor WAL activity, or what is generating it.&lt;/p&gt;

&lt;h3 id=&quot;new-infrastructure-to-track-wal-activity&quot;&gt;New infrastructure to track WAL activity&lt;/h3&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;commit df3b181499b40523bd6244a4e5eb554acb9020ce
Author: Amit Kapila &amp;lt;akapila@postgresql.org&amp;gt;
Date:   Sat Apr 4 10:02:08 2020 +0530

    Add infrastructure to track WAL usage.

    This allows gathering the WAL generation statistics for each statement
    execution.  The three statistics that we collect are the number of WAL
    records, the number of full page writes and the amount of WAL bytes
    generated.

    This helps the users who have write-intensive workload to see the impact
    of I/O due to WAL.  This further enables us to see approximately what
    percentage of overall WAL is due to full page writes.

    In the future, we can extend this functionality to allow us to compute the
    the exact amount of WAL data due to full page writes.

    This patch in itself is just an infrastructure to compute WAL usage data.
    The upcoming patches will expose this data via explain, auto_explain,
    pg_stat_statements and verbose (auto)vacuum output.

    Author: Kirill Bychik, Julien Rouhaud
    Reviewed-by: Dilip Kumar, Fujii Masao and Amit Kapila
    Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;With this new infrastructure, each backend will track various information about
WAL generation: the number of WAL records, the size of WAL generated and the
number of full page images generated.  It also makes sure that parallel
queries, both DML and utility statements (for now only CREATE INDEX and VACUUM)
are correctly handled.&lt;/p&gt;

&lt;h3 id=&quot;per-query-wal-activity-with-pg_stat_statements&quot;&gt;Per-query WAL activity with pg_stat_statements&lt;/h3&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;commit 6b466bf5f2bea0c89fab54eef696bcfc7ecdafd7
Author: Amit Kapila &amp;lt;akapila@postgresql.org&amp;gt;
Date:   Sun Apr 5 07:34:04 2020 +0530

    Allow pg_stat_statements to track WAL usage statistics.

    This commit adds three new columns in pg_stat_statements output to
    display WAL usage statistics added by commit df3b181499.

    This commit doesn't bump the version of pg_stat_statements as the
    same is done for this release in commit 17e0328224.

    Author: Kirill Bychik and Julien Rouhaud
    Reviewed-by: Julien Rouhaud, Fujii Masao, Dilip Kumar and Amit Kapila
    Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This basically exposes the mentionned new information about WAL activity in
pg_stat_activity, so per (user, database, normalized query).  Here is an
example:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt;

&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;

&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;UPDATE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SET&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;UPDATE&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;

&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CHECKPOINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CHECKPOINT&lt;/span&gt;

&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DELETE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;DELETE&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;wal_records&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;wal_bytes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;wal_num_fpw&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_stat_statements&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;query&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;LIKE&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'UPDATE%'&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;OR&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;query&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;LIKE&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'DELETE%'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
                   &lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;                &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;wal_records&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;wal_bytes&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;wal_num_fpw&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-------------------------------------+-------------+-----------+-------------&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;DELETE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;        &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;       &lt;span class=&quot;mi&quot;&gt;155&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;UPDATE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SET&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;        &lt;span class=&quot;mi&quot;&gt;69&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;I simply inserted a row, updated it and deleted it.  Now, looking specifically
at the UPDATE and the DELETE, the numbers can be surprising.&lt;/p&gt;

&lt;p&gt;When inserting a row, we indeed expect a single WAL record and some WAL bytes
for the new row, with some overhead due to internal implementation.&lt;/p&gt;

&lt;p&gt;Now, if you’re familiar with PostgreSQL MVCC implementation, you should know
that doing a DELETE should only write a transaction id in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;xmax&lt;/code&gt; field
(&lt;a href=&quot;https://www.postgresql.org/docs/current/storage-page-layout.html&quot;&gt;this documentation
page&lt;/a&gt; is a
good introduction on that subject).  So why writing a 4B field (the size of the
recotded &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;xmax&lt;/code&gt; field), even with some overhead, is writing more than twice the
amount of WAL that was required to update a full row?  That’s because the
DELETE caused a &lt;a href=&quot;https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-FULL-PAGE-WRITES&quot;&gt;full page
write&lt;/a&gt;.
This is a side effect of performing a &lt;strong&gt;CHECKPOINT&lt;/strong&gt; before the DELETE.  To
guarantee data consistency (and if &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;full_page_writes&lt;/code&gt; parameter isn’t
deactivated), any block modified for the first time after a &lt;strong&gt;CHECKPOINT&lt;/strong&gt;
completion will be fully logged, rather than logging only the delta.&lt;/p&gt;

&lt;p&gt;You’ll also note that the full page didn’t generate 8kB of data as you could
expect.  This isn’t because of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;wal_compression&lt;/code&gt;, as I didn’t activate it, but
because the page is almost empty.  Indeed, as an optimization, any “hole” in
a page, as long as it’s a standard page, can be safely skipped in the WAL.  If
you’re curious, this is done in the &lt;a href=&quot;https://github.com/postgres/postgres/blob/master/src/backend/access/transam/xloginsert.c&quot;&gt;XLogRecordAssemble() function
&lt;/a&gt;.
Here’s the relevant extract:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;XLogRecData&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;XLogRecordAssemble&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;RmgrId&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rmid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;uint8&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;info&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
				   &lt;span class=&quot;n&quot;&gt;XLogRecPtr&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RedoRecPtr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;bool&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;doPageWrites&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
				   &lt;span class=&quot;n&quot;&gt;XLogRecPtr&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;fpw_lsn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;num_fpw&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;[...]&lt;/span&gt;
		&lt;span class=&quot;cm&quot;&gt;/*
		 * If needs_backup is true or WAL checking is enabled for current
		 * resource manager, log a full-page write for the current block.
		 */&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;include_image&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;needs_backup&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;||&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;info&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;XLR_CHECK_CONSISTENCY&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

		&lt;span class=&quot;n&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;include_image&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
		&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
			&lt;span class=&quot;n&quot;&gt;Page&lt;/span&gt;		&lt;span class=&quot;n&quot;&gt;page&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;regbuf&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;page&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
			&lt;span class=&quot;n&quot;&gt;uint16&lt;/span&gt;		&lt;span class=&quot;n&quot;&gt;compressed_len&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

			&lt;span class=&quot;cm&quot;&gt;/*
			 * The page needs to be backed up, so calculate its hole length
			 * and offset.
			 */&lt;/span&gt;
			&lt;span class=&quot;n&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;regbuf&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;flags&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;REGBUF_STANDARD&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
			&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
				&lt;span class=&quot;cm&quot;&gt;/* Assume we can omit data between pd_lower and pd_upper */&lt;/span&gt;
				&lt;span class=&quot;n&quot;&gt;uint16&lt;/span&gt;		&lt;span class=&quot;k&quot;&gt;lower&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PageHeader&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;page&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pd_lower&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
				&lt;span class=&quot;n&quot;&gt;uint16&lt;/span&gt;		&lt;span class=&quot;k&quot;&gt;upper&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PageHeader&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;page&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pd_upper&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

				&lt;span class=&quot;n&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;lower&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SizeOfPageHeaderData&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt;
					&lt;span class=&quot;k&quot;&gt;upper&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;lower&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt;
					&lt;span class=&quot;k&quot;&gt;upper&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BLCKSZ&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
				&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
					&lt;span class=&quot;n&quot;&gt;bimg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hole_offset&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;lower&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
					&lt;span class=&quot;n&quot;&gt;cbimg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hole_length&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;upper&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;lower&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
				&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
				&lt;span class=&quot;k&quot;&gt;else&lt;/span&gt;
				&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
					&lt;span class=&quot;cm&quot;&gt;/* No &quot;hole&quot; to remove */&lt;/span&gt;
					&lt;span class=&quot;n&quot;&gt;bimg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hole_offset&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
					&lt;span class=&quot;n&quot;&gt;cbimg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hole_length&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
				&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
			&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
            &lt;span class=&quot;p&quot;&gt;[...]&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;h3 id=&quot;wal-activity-in-explain-and-auto_explain&quot;&gt;WAL activity in EXPLAIN (and auto_explain)&lt;/h3&gt;

&lt;p&gt;A new &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WAL&lt;/code&gt; option is available in the &lt;strong&gt;EXPLAIN&lt;/strong&gt; command, and similarly a
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;auto_explain.log_wal&lt;/code&gt; for &lt;strong&gt;auto_explain&lt;/strong&gt;, to display the same counters.  In
TEXT mode, only the non-zero counters are shown, similarly to other counters.
For instance:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;EXPLAIN&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;ANALYZE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WAL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;COSTS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;OFF&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;UPDATE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SET&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
                           &lt;span class=&quot;n&quot;&gt;QUERY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PLAN&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;----------------------------------------------------------------&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;Update&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;on&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;actual&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;181&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;..&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;181&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;loops&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;WAL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;records&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;bytes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;68&lt;/span&gt;
   &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;Seq&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Scan&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;on&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;actual&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;074&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;..&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;080&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;loops&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
         &lt;span class=&quot;n&quot;&gt;Filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;Planning&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;274&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ms&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;Execution&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;381&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ms&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;h3 id=&quot;wal-activity-in-autovacuum-logs&quot;&gt;WAL activity in autovacuum logs&lt;/h3&gt;

&lt;p&gt;And finally, if an autovacuum is logging its activity (when reaching the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;log_autovacuum_min_duration&lt;/code&gt; threshold), the same information will be logged.
For instance, after inserting 100k records in the same table, deleting half of
them and running a &lt;strong&gt;CHECKPOINT&lt;/strong&gt;, here’s the output I get:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;LOG&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;automatic&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;vacuum&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;of&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;table&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;&quot;rjuju.public.t1&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;index&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;scans&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;pages&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;removed&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;443&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;remain&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;skipped&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;due&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;to&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pins&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;skipped&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;frozen&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;tuples&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;50000&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;removed&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;50001&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;remain&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;are&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dead&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;but&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;yet&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;removable&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;oldest&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;xmin&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;496&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;buffer&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;usage&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;912&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hits&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;misses&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;448&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dirtied&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;avg&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;read&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;084&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MB&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;avg&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;write&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;12&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;485&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MB&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;system&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;usage&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CPU&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;user&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;17&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;system&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;00&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;elapsed&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;28&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;WAL&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;usage&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1330&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;records&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;445&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;full&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;page&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;writes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2197104&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bytes&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;This new log output is in my opinion especially important, especially when it
comes to &lt;a href=&quot;https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND&quot;&gt;anti-wraparound / FREEZE
vacuum&lt;/a&gt;.
Indeed, by nature an anti-wraparound VACUUM is more likely to touch blocks that
weren’t modified for a long time as it’s targeting tuple being visible for
more than 200M transactions (by default).  Even though it’s only setting a flag
bit to mark the tuple as frozen, if that block wasn’t modified since the last
&lt;strong&gt;CHECKPOINT&lt;/strong&gt;, this bit will be amplified to a &lt;strong&gt;full page image&lt;/strong&gt; which is
way more data.&lt;/p&gt;

&lt;p&gt;With this new feature, it’s now possible to really monitor the WAL
generation, which will help to better tune your instances!&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://rjuju.github.io/postgresql/2020/04/07/new-in-pg13-WAL-monitoring.html&quot;&gt;New in pg13: WAL monitoring&lt;/a&gt; was originally published by Julien Rouhaud at &lt;a href=&quot;https://rjuju.github.io&quot;&gt;rjuju's home&lt;/a&gt; on April 07, 2020.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[New in pg13: Monitoring the query planner]]></title>
  <link rel="alternate" type="text/html" href="https://rjuju.github.io/postgresql/2020/04/04/new-in-pg13-monitoring-query-planner.html" />
  <id>https://rjuju.github.io/postgresql/2020/04/04/new-in-pg13-monitoring-query-planner</id>
  <published>2020-04-04T12:06:15+00:00</published>
  <updated>2020-04-04T12:06:15+00:00</updated>
  <author>
    <name>Julien Rouhaud</name>
    <uri>https://rjuju.github.io</uri>
    
  </author>
  <content type="html">
    &lt;p&gt;Depending on your workload, the planning time can represent a significant part
of the overal query procesing time.  This is especially import in OLTP
workload, but OLAP queries with numerous tables being joined and an aggressive
configuration on the JOIN order search can also lead to hight planning time.&lt;/p&gt;

&lt;h3 id=&quot;planning-counters-in-pg_stat_statements&quot;&gt;Planning counters in pg_stat_statements&lt;/h3&gt;

&lt;p&gt;Previously, pg_stat_statements was only keeping track of the execution part
of a query processing: the number of execution, cumulated time, but also
minimum, maximum, mean and also the standard deviation.  With PostgreSQL 13,
you’ll also have those metrics for the planification part!&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;commit 17e03282241c6ac58a714eb0c3b6a8018cf6167a
Author: Fujii Masao &amp;lt;fujii@postgresql.org&amp;gt;
Date:   Thu Apr 2 11:20:19 2020 +0900

    Allow pg_stat_statements to track planning statistics.

    This commit makes pg_stat_statements support new GUC
    pg_stat_statements.track_planning. If this option is enabled,
    pg_stat_statements tracks the planning statistics of the statements,
    e.g., the number of times the statement was planned, the total time
    spent planning the statement, etc. This feature is useful to check
    the statements that it takes a long time to plan. Previously since
    pg_stat_statements tracked only the execution statistics, we could
    not use that for the purpose.

    The planning and execution statistics are stored at the end of
    each phase separately. So there are not always one-to-one relationship
    between them. For example, if the statement is successfully planned
    but fails in the execution phase, only its planning statistics are stored.
    This may cause the users to be able to see different pg_stat_statements
    results from the previous version. To avoid this,
    pg_stat_statements.track_planning needs to be disabled.

    This commit bumps the version of pg_stat_statements to 1.8
    since it changes the definition of pg_stat_statements function.

    Author: Julien Rouhaud, Pascal Legrand, Thomas Munro, Fujii Masao
    Reviewed-by: Sergei Kornilov, Tomas Vondra, Yoshikazu Imai, Haribabu Kommi, Tom Lane
    Discussion: https://postgr.es/m/CAHGQGwFx_=DO-Gu-MfPW3VQ4qC7TfVdH2zHmvZfrGv6fQ3D-Tw@mail.gmail.com
    Discussion: https://postgr.es/m/CAEepm=0e59Y_6Q_YXYCTHZkqOc6H2pJ54C_Xe=VFu50Aqqp_sA@mail.gmail.com
    Discussion: https://postgr.es/m/DB6PR0301MB21352F6210E3B11934B0DCC790B00@DB6PR0301MB2135.eurprd03.prod.outlook.com
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Keep in mind that even simple query can have a surprisingly high planification
time.  One of the frequent cause was the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;get_actual_variable_range()&lt;/code&gt;
function, which is called when the planner wants to know what are the minimum
and maximum values of a specific field.  This function detects if a suitable
index exists, and if there’s one it gets the wanted values.  However, when
there were a lot of uncommitted values at the end of the index range, it could
take a significant amount of time to get a visible value.  While this problem
has been fixed long ago (see &lt;a href=&quot;https://github.com/postgres/postgres/commit/fccebe421d0c410e6378fb281419442c84759213&quot;&gt;this
commit&lt;/a&gt;
and &lt;a href=&quot;https://github.com/postgres/postgres/commit/3ca930fc39ccf987c1c22fd04a1e7463b5dd0dfd&quot;&gt;this other
commit&lt;/a&gt;
for more details), there are still some cases where the planning time is higher
than what you’d expect, so having an easy way to monitor the planification
metrics is worthwhile.&lt;/p&gt;

&lt;p&gt;This feature can also be interesting to know how much you’re using the &lt;a href=&quot;https://www.postgresql.org/docs/current/sql-prepare.html&quot;&gt;generic
plan feature&lt;/a&gt; for
instance, and how much of a difference this should make for instance.&lt;/p&gt;

&lt;p&gt;Let’s see a simple example, to see the effect of generic plans with prepared
statements:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;PREPARE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_class&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;PREPARE&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;EXECUTE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;count&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-------&lt;/span&gt;
   &lt;span class=&quot;mi&quot;&gt;387&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;row&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;p&quot;&gt;[...&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;more&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;times&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;...]&lt;/span&gt;

&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;plans&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;total_plan_time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;total_plan_time&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;plans&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;avg_plan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;calls&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;total_exec_time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;total_exec_time&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;calls&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;avg_exec&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_stat_statements&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;query&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ILIKE&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'%SELECT count(*) FROM pg_class%'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RECORD&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;---+--------------------------------------------&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;PREPARE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_class&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;plans&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;total_plan_time&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;119496&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;avg_plan&lt;/span&gt;        &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;119496&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;calls&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;total_exec_time&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4918280000000004&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;avg_exec&lt;/span&gt;        &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5819713333333334&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;While the query was executed 6 times, it was actually planned only once (since
there’s no parameter, a generic plan is always used).  While the execution time
is on average slightly more than half a milliscond, a single planning was
almost &lt;strong&gt;4 times&lt;/strong&gt; more expensive.  By saving 5 planification, postgres saved
up to &lt;strong&gt;10ms&lt;/strong&gt;.&lt;/p&gt;

&lt;h3 id=&quot;planning-buffers-in-explain&quot;&gt;Planning buffers in EXPLAIN&lt;/h3&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;commit ce77abe63cfc85fb0bc236deb2cc34ae35cb5324
Author: Fujii Masao &amp;lt;fujii@postgresql.org&amp;gt;
Date:   Sat Apr 4 03:13:17 2020 +0900

    Include information on buffer usage during planning phase, in EXPLAIN output, take two.

    When BUFFERS option is enabled, EXPLAIN command includes the information
    on buffer usage during each plan node, in its output. In addition to that,
    this commit makes EXPLAIN command include also the information on
    buffer usage during planning phase, in its output. This feature makes it
    easier to discern the cases where lots of buffer access happen during
    planning.

    This commit revives the original commit ed7a509571 that was reverted by
    commit 19db23bcbd. The original commit had to be reverted because
    it caused the regression test failure on the buildfarm members prion and
    dory. But since commit c0885c4c30 got rid of the caues of the test failure,
    the original commit can be safely introduced again.

    Author: Julien Rouhaud, slightly revised by Fujii Masao
    Reviewed-by: Justin Pryzby
    Discussion: https://postgr.es/m/16109-26a1a88651e90608@postgresql.org
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Following the same idea, EXPLAIN will now display the buffer usage if the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BUFFERS&lt;/code&gt; option is used.  If you try that on a fresh new connection, before
any catalog cache is populated, you could be surprised on how many buffers
would be accessed even for a simple query:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;EXPLAIN&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BUFFERS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ANALYZE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;COSTS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;OFF&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_class&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
                                               &lt;span class=&quot;n&quot;&gt;QUERY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PLAN&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;---------------------------------------------------------------------------------------------------------&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;Seq&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Scan&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;on&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_class&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;actual&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;028&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;..&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;410&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;388&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;loops&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;Buffers&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;shared&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hit&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;13&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;Planning&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;157&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ms&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;Buffers&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;shared&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hit&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;118&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;Execution&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;257&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ms&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;EXPLAIN&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BUFFERS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ANALYZE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;COSTS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;OFF&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_class&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
                            &lt;span class=&quot;n&quot;&gt;QUERY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PLAN&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;------------------------------------------------------------------&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;Seq&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Scan&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;on&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_class&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;actual&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;035&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;..&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;413&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;388&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;loops&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;Buffers&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;shared&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hit&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;13&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;Planning&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;393&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ms&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;Execution&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;670&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ms&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;We can see here that populating the cache (relation, columns, datatypes…)
access 118 blocks, and that’s probably a significant part of the 5 extra ms we
saw in the first EXPLAIN output.&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://rjuju.github.io/postgresql/2020/04/04/new-in-pg13-monitoring-query-planner.html&quot;&gt;New in pg13: Monitoring the query planner&lt;/a&gt; was originally published by Julien Rouhaud at &lt;a href=&quot;https://rjuju.github.io&quot;&gt;rjuju's home&lt;/a&gt; on April 04, 2020.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[Planner selectivity estimation error statistics with pg_qualstats 2]]></title>
  <link rel="alternate" type="text/html" href="https://rjuju.github.io/postgresql/2020/02/28/pg_qualstats-2-selectivity-error.html" />
  <id>https://rjuju.github.io/postgresql/2020/02/28/pg_qualstats-2-selectivity-error</id>
  <published>2020-02-28T12:37:04+00:00</published>
  <updated>2020-02-28T12:37:04+00:00</updated>
  <author>
    <name>Julien Rouhaud</name>
    <uri>https://rjuju.github.io</uri>
    
  </author>
  <content type="html">
    &lt;p&gt;Selectivity estimation error is one of the main cause of bad query plans.  It’s
quite straighforward to compute those estimation error using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;EXPLAIN
(ANALYZE)&lt;/code&gt;, either manually or with the help of
&lt;a href=&quot;https://explain.depesz.com/&quot;&gt;explain.depesz.com&lt;/a&gt; (or other similar tools),
but until now there were now tool available to get this information
automatically and globally.  Version 2 of pg_qualstats fixes that, thanks a
lot to &lt;a href=&quot;https://twitter.com/obartunov&quot;&gt;Oleg Bartunov&lt;/a&gt; for the original idea!&lt;/p&gt;

&lt;p&gt;Note: If you don’t know pg_qualstats extension, you may want to see &lt;a href=&quot;/postgresql/2020/01/06/pg_qualstats-2-global-index-advisor.html&quot;&gt;my last
article about it&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;the-problem&quot;&gt;The problem&lt;/h3&gt;

&lt;p&gt;There can be many causes to that issue: outdated statistics, complex
predicates, non uniform data…  But whatever the reason is, if the optimizer
doesn’t have an accurate idea on how much data each predicate will filter, the
result is the same: a bad query plan, which can lead to longer query execution.&lt;/p&gt;

&lt;p&gt;To illustrate the problem, I’ll use here a simple test case, voluntarily built
to fool the optimizer.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pgqs&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt;
             &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val1&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val2&lt;/span&gt;
             &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;generate_series&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;50000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;50000&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;VACUUM&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ANALYZE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pgqs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;VACUUM&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;EXPLAIN&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;ANALYZE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pgqs&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
                             &lt;span class=&quot;n&quot;&gt;QUERY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PLAN&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;--------------------------------------------------------------------&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;Seq&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Scan&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;on&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pgqs&lt;/span&gt;  &lt;span class=&quot;p&quot;&gt;([...]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;12500&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;width&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;([...]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;loops&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;Filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;Rows&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Removed&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;by&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;50000&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;Planning&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;553&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ms&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;Execution&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;38&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;062&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ms&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Here postgres think that the query will emit 12500 tuples, while in reality
none will be emitted.  If you’re wondering how postgres came up with that
number, the explanation is simple.  When multiple independant (overlapping
range predicate can be merged) clauses are AND-ed and no extended statistics
are available (see below for more about it), postgres will simply multiply each
clause selectivity.  This is done in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;clauselist_selectivity_simple&lt;/code&gt;, in
&lt;a href=&quot;https://github.com/postgres/postgres/blob/master/src/backend/optimizer/path/clausesel.c&quot;&gt;src/backend/optimizer/path/clausesel.c&lt;/a&gt;:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c&quot; data-lang=&quot;c&quot;&gt;&lt;span class=&quot;n&quot;&gt;Selectivity&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;clauselist_selectivity_simple&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PlannerInfo&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;root&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;List&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;clauses&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;varRelid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;JoinType&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;jointype&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;SpecialJoinInfo&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sjinfo&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;Bitmapset&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;estimatedclauses&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;Selectivity&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;[...]&lt;/span&gt;
  &lt;span class=&quot;cm&quot;&gt;/*
   * Anything that doesn't look like a potential rangequery clause gets
   * multiplied into s1 and forgotten. Anything that does gets inserted into
   * an rqlist entry.
   */&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;listidx&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;foreach&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;l&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;clauses&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;[...]&lt;/span&gt;
    &lt;span class=&quot;cm&quot;&gt;/* Always compute the selectivity using clause_selectivity */&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;s2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;clause_selectivity&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;root&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;clause&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;varRelid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;jointype&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sjinfo&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;[...]&lt;/span&gt;
        &lt;span class=&quot;cm&quot;&gt;/*
         * If it's not a &quot;&amp;lt;&quot;/&quot;&amp;lt;=&quot;/&quot;&amp;gt;&quot;/&quot;&amp;gt;=&quot; operator, just merge the
         * selectivity in generically.  But if it's the right oprrest,
         * add the clause to rqlist for later processing.
         */&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;switch&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_oprrest&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;expr&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;opno&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
          &lt;span class=&quot;p&quot;&gt;[...]&lt;/span&gt;
          &lt;span class=&quot;nl&quot;&gt;default:&lt;/span&gt;
            &lt;span class=&quot;cm&quot;&gt;/* Just merge the selectivity in generically */&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;s1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;break&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
          &lt;span class=&quot;p&quot;&gt;[...]&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;In this case, each predicate will independantly filter approximately 50% of the
table, as we can see in &lt;strong&gt;pg_stats view&lt;/strong&gt;:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tablename&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;attname&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;most_common_vals&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;most_common_freqs&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_stats&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tablename&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'pgqs'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;tablename&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;attname&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;most_common_vals&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;most_common_freqs&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-----------+---------+------------------+-------------------------&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;pgqs&lt;/span&gt;      &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val1&lt;/span&gt;    &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;            &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;50116664&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;49883333&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;pgqs&lt;/span&gt;      &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val2&lt;/span&gt;    &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;            &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;50116664&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;49883333&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;So when using both clauses, the estimate is 25% of the table, since postgres
doesn’t know &lt;strong&gt;by default&lt;/strong&gt; that both values are mutually exclusive.
Continuing with this artificial test case, let’s see what happens if we add a
&lt;em&gt;join&lt;/em&gt; on top of if.  For instance, joining the table to itself on the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;val1&lt;/code&gt;
column only.  For clarity, I’ll use &lt;strong&gt;t1&lt;/strong&gt; for the table on which I’m applying
the mutually exclusive predicates, and &lt;strong&gt;t2&lt;/strong&gt; the table joined:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;EXPLAIN&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ANALYZE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pgqs&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pgqs&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val1&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
                                     &lt;span class=&quot;n&quot;&gt;QUERY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PLAN&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-----------------------------------------------------------------------------------&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;Nested&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Loop&lt;/span&gt;  &lt;span class=&quot;p&quot;&gt;([...]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;313475000&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;width&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;([...]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;loops&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;Seq&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Scan&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;on&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pgqs&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;  &lt;span class=&quot;p&quot;&gt;([...]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;25078&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;width&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;([...]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;25000&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;loops&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
         &lt;span class=&quot;n&quot;&gt;Filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
         &lt;span class=&quot;k&quot;&gt;Rows&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Removed&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;by&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;25000&lt;/span&gt;
   &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;Materialize&lt;/span&gt;  &lt;span class=&quot;p&quot;&gt;([...]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;12500&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;width&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;([...]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;loops&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;25000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
         &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;Seq&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Scan&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;on&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pgqs&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt;  &lt;span class=&quot;p&quot;&gt;([...]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;12500&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;width&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;([...]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;loops&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
               &lt;span class=&quot;n&quot;&gt;Filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
               &lt;span class=&quot;k&quot;&gt;Rows&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Removed&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;by&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;50000&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;Planning&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;943&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ms&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;Execution&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;86&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;757&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ms&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;14&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Postgres thinks that this join will emit &lt;strong&gt;313 millions rows&lt;/strong&gt;, while obviously
no rows will be emitted.  And this is a good example on how bad assumptions can
lead to an inefficient plan.&lt;/p&gt;

&lt;p&gt;Here Postgres can deduce that the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;val1 = 0&lt;/code&gt; predicate can be applied to
&lt;strong&gt;t2&lt;/strong&gt;.  So how to join two relations, one that should emit 25000 tuples and
the other that should emit 12500 tuples, with no index available?  A nested
loop is not a bad choice, as both relation aren’t really big.  As no index is
available, postgres also chooses to &lt;strong&gt;materialize&lt;/strong&gt; the inner relation, meaning
storing it in memory, to make it more efficient.  As it tries to limit memory
consumption as much as possible, the smallest relation is materialized, and
that’s the mistake here.&lt;/p&gt;

&lt;p&gt;Indeed, postgres will read the whole table twice: once to get every rows
corresponding to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;val1 = 0&lt;/code&gt; predicate for the outer relation, and once to
find all rows to be materialized.  If the opposite was done, as it would
probably have if the estimates had been more realistic, the table would only
have been read once.&lt;/p&gt;

&lt;p&gt;In this case, as the dataset isn’t big and quite artificial, a better plan
wouldn’t drastically change the execution time.  But keep in mind than with
real production environements, it could mean choosing a nested loop assuming
that there’ll be only a couple of rows to loop on while in reality the backend
will spend minutes or even hours looping over millions of rows, and another
plan would have been orders of magnitude quicker.&lt;/p&gt;

&lt;h3 id=&quot;detecting-the-problem&quot;&gt;Detecting the problem&lt;/h3&gt;

&lt;p&gt;pg_qualstats 2 will now compute the selectivity estimation error, both in a
ratio and a raw number, and will keep track for each predicate the minimum,
maximum and mean values, with the standard deviation.  This is now quite simple
to detect problematic quals!&lt;/p&gt;

&lt;p&gt;After executing the last query, here’s what the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pg_qualstats&lt;/code&gt; view will
return:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;relname&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;attname&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;opno&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;regoper&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;qualid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;qualnodeid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;mean_err_estimate_ratio&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mean_ratio&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mean_err_estimate_num&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mean_num&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;constvalue&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_qualstats&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pgqs&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_class&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pgqs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lrelid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;oid&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_attribute&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;attrelid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;oid&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;attnum&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pgqs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lattnum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;relname&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;attname&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;opno&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;   &lt;span class=&quot;n&quot;&gt;qualid&lt;/span&gt;   &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;qualnodeid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mean_ratio&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mean_num&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;constvalue&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;---------+---------+------+------------+------------+------------+----------+------------&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;pgqs&lt;/span&gt;    &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val1&lt;/span&gt;    &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;    &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3161070364&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;00393542&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;       &lt;span class=&quot;mi&quot;&gt;98&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;pgqs&lt;/span&gt;    &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val1&lt;/span&gt;    &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;    &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3864967567&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3161070364&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;      &lt;span class=&quot;mi&quot;&gt;12500&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;    &lt;span class=&quot;mi&quot;&gt;12500&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;pgqs&lt;/span&gt;    &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val2&lt;/span&gt;    &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;    &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3864967567&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3065200358&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;      &lt;span class=&quot;mi&quot;&gt;12500&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;    &lt;span class=&quot;mi&quot;&gt;12500&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p class=&quot;notice&quot;&gt;&lt;strong&gt;NOTE:&lt;/strong&gt; &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;qualid&lt;/code&gt; is an identifier if multiple qual are AND-ed, NULL
otherwise, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;qualnodeid&lt;/code&gt; is a per-qual only identifier.&lt;/p&gt;

&lt;p&gt;We see here that when used alone, the qual &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pgqs.val = ?&lt;/code&gt; doesn’t show any
selectivity estimate problem as the ratio (&lt;em&gt;mean_ratio&lt;/em&gt;) is very close to
&lt;strong&gt;1&lt;/strong&gt; and the raw number (&lt;em&gt;mean_num&lt;/em&gt;) is quite low.  On the other hand, when
combined with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AND pgqs.val2 = ?&lt;/code&gt; pg_qualstats reports significant estimate
error.  That’s a very strong sign that those columns are functionally
dependent.&lt;/p&gt;

&lt;p&gt;If for example a qual alone shows issues, it could be a sign of outdated
statistics, or that the sample size isn’t big enough.&lt;/p&gt;

&lt;p&gt;Also, if you have &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pg_stat_statements&lt;/code&gt; extension installed, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pg_qualstats&lt;/code&gt; will
give you the &lt;em&gt;query identifier&lt;/em&gt; for each predicate.  With that and a bit of
SQL, you can for instance find the query with a long average execution time
which contains quals for which the selectivity estimation is off by 10 or more.&lt;/p&gt;

&lt;h3 id=&quot;interlude-extended-statistics&quot;&gt;Interlude: Extended statistics&lt;/h3&gt;

&lt;p&gt;If you’re wondering how to solve the issue I just explained, the solution is
very easy since &lt;strong&gt;extended statistics&lt;/strong&gt; were introduced in PostgreSQL 10, and
assuming that you know that’s the root issue.  &lt;a href=&quot;https://www.postgresql.org/docs/current/sql-createstatistics.html&quot;&gt;Create an extended
statistcs&lt;/a&gt;
on the related columns, perform an ANALYZE and you’re done!&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;STATISTICS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pgqs_stats&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val2&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pgqs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;STATISTICS&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ANALYZE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pgqs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;ANALYZE&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;EXPLAIN&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ANALYZE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pgqs&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pgqs&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val1&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;order&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;by&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
                             &lt;span class=&quot;n&quot;&gt;QUERY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PLAN&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-------------------------------------------------------------------------&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;Nested&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Loop&lt;/span&gt;  &lt;span class=&quot;p&quot;&gt;([...]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;25002&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;width&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;([...]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;loops&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;Seq&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Scan&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;on&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pgqs&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt;  &lt;span class=&quot;p&quot;&gt;([...]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;width&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;([...]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;loops&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
         &lt;span class=&quot;n&quot;&gt;Filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
         &lt;span class=&quot;k&quot;&gt;Rows&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Removed&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;by&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;50000&lt;/span&gt;
   &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;Seq&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Scan&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;on&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pgqs&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t2&lt;/span&gt;  &lt;span class=&quot;p&quot;&gt;([...]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;25002&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;width&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;never&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;executed&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
         &lt;span class=&quot;n&quot;&gt;Filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;Planning&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;559&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ms&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;Execution&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;39&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;471&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ms&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;If you want more details on extended statistics, I recommend looking at the
slides from &lt;a href=&quot;https://blog.pgaddict.com/&quot;&gt;Tomas Vondra&lt;/a&gt;’s &lt;a href=&quot;https://www.postgresql.eu/events/pgconfeu2018/sessions/session/2083/slides/130/create-statistics-what-is-it.pdf&quot;&gt;excellent talk on
this
subject&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;going-further&quot;&gt;Going further&lt;/h3&gt;

&lt;p&gt;Tracking the quals in every single qual executed is of course quite expensive,
and would significantly impact the performance for any non datawarehouse
workload.  That’s why &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pg_qualstats&lt;/code&gt; has an option,
&lt;strong&gt;pg_qualstats.sample_rate&lt;/strong&gt;,  to sample the query that will be processed.
This setting is by default set to &lt;strong&gt;1 / max_connections&lt;/strong&gt;, which will make the
overhead quite negligible, but don’t be surprised if you don’t see any qual
reported after running a few queries!&lt;/p&gt;

&lt;p&gt;But if you’re instead only interested by the quals that has bad selectivity
estimation, for instance to detect this class of problem rather than missing
indexes, there are two new options available for that:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;pg_qualstats.min_err_estimate_ratio&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;pg_qualstats.min_err_estimate_num&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those options are cumulative and can be changed at anytime, and will limit the
quals that pg_qualstats will store to the ones that have a selectivity
estimate ratio and/or raw number higher that what you ask.  Although those
options will help to reduce the performance overhead, they of course can be
combined with &lt;strong&gt;pg_qualstats.sample_rate&lt;/strong&gt; if needed.&lt;/p&gt;

&lt;h3 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;After &lt;a href=&quot;/postgresql/2020/01/06/pg_qualstats-2-global-index-advisor.html&quot;&gt;introducing the new global index advisor&lt;/a&gt;, this article presented
a class of problems that are frequently seen as a DBA, and how to detect and
solve them.&lt;/p&gt;

&lt;p&gt;I believe that those two new features in pg_qualstats will greatly help
PostgreSQL databases administration.  Also, external tools that aims to solve
related issue, such as
&lt;a href=&quot;https://github.com/ossc-db/pg_plan_advsr&quot;&gt;pg_plan_advsr&lt;/a&gt; or
&lt;a href=&quot;https://github.com/postgrespro/aqo&quot;&gt;AQO&lt;/a&gt; could also benefit from
pg_qualstats, as they could directly get the exact data they need to be able
perform analysis and optimize the queries!&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://rjuju.github.io/postgresql/2020/02/28/pg_qualstats-2-selectivity-error.html&quot;&gt;Planner selectivity estimation error statistics with pg_qualstats 2&lt;/a&gt; was originally published by Julien Rouhaud at &lt;a href=&quot;https://rjuju.github.io&quot;&gt;rjuju's home&lt;/a&gt; on February 28, 2020.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[New in pg13: New leader_pid column in pg_stat_activity]]></title>
  <link rel="alternate" type="text/html" href="https://rjuju.github.io/postgresql/2020/02/06/new-in-pg13-leader_pid.html" />
  <id>https://rjuju.github.io/postgresql/2020/02/06/new-in-pg13-leader_pid</id>
  <published>2020-02-06T12:59:53+00:00</published>
  <updated>2020-02-06T12:59:53+00:00</updated>
  <author>
    <name>Julien Rouhaud</name>
    <uri>https://rjuju.github.io</uri>
    
  </author>
  <content type="html">
    &lt;h3 id=&quot;new-leader_pid-column-in-pg_stat_activity-view&quot;&gt;New leader_pid column in pg_stat_activity view&lt;/h3&gt;

&lt;p&gt;Surprisingly, since parallel query was introduced in PostgreSQL 9.6, it was
impossible to know wich backend a parallel worker was related to.  So, as
&lt;a href=&quot;https://twitter.com/g_lelarge/status/1209486212190343168&quot;&gt;Guillaume pointed
out&lt;/a&gt;, it makes it
quite difficult to build simple tools that can sample the wait events related
to all process involved in a query.  A simple solution to that problem is to
export the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;lock group leader&lt;/code&gt; information available in the backend at the SQL
level:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;commit b025f32e0b5d7668daec9bfa957edf3599f4baa8
Author: Michael Paquier &amp;lt;michael@paquier.xyz&amp;gt;
Date:   Thu Feb 6 09:18:06 2020 +0900

Add leader_pid to pg_stat_activity

This new field tracks the PID of the group leader used with parallel
query.  For parallel workers and the leader, the value is set to the
PID of the group leader.  So, for the group leader, the value is the
same as its own PID.  Note that this reflects what PGPROC stores in
shared memory, so as leader_pid is NULL if a backend has never been
involved in parallel query.  If the backend is using parallel query or
has used it at least once, the value is set until the backend exits.

Author: Julien Rouhaud
Reviewed-by: Sergei Kornilov, Guillaume Lelarge, Michael Paquier, Tomas
Vondra
Discussion: https://postgr.es/m/CAOBaU_Yy5bt0vTPZ2_LUM6cUcGeqmYNoJ8-Rgto+c2+w3defYA@mail.gmail.com
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;With this change, you can now easily find all processes involved in a parallel
query.  For instance:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;leader_pid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;array_agg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;leader_pid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;members&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_stat_activity&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;leader_pid&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;IS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NOT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;GROUP&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;leader_pid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
       &lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;       &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;leader_pid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;members&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-------------------+------------+---------------&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;select&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;      &lt;span class=&quot;mi&quot;&gt;31630&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;32269&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;32268&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;row&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Be careful, as mentionned in the commit message, if the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;leader_pid&lt;/code&gt; is the
same as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pid&lt;/code&gt;, it doesn’t necessarily mean that the backend is currently
performing a parallel query, as once set this field is never reset.  Also, to
avoid extra ovherhead, no additional lock is held while outputting the data.
It means that each row is processed independently.  So, while quite unlikely,
you can get in some circumstances inconsistent data, such as a parallel worker
pointing to a pid that already disconnected.&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://rjuju.github.io/postgresql/2020/02/06/new-in-pg13-leader_pid.html&quot;&gt;New in pg13: New leader_pid column in pg_stat_activity&lt;/a&gt; was originally published by Julien Rouhaud at &lt;a href=&quot;https://rjuju.github.io&quot;&gt;rjuju's home&lt;/a&gt; on February 06, 2020.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[pg qualstats 2: Global index advisor]]></title>
  <link rel="alternate" type="text/html" href="https://rjuju.github.io/postgresql/2020/01/06/pg_qualstats-2-global-index-advisor.html" />
  <id>https://rjuju.github.io/postgresql/2020/01/06/pg_qualstats-2-global-index-advisor</id>
  <published>2020-01-06T12:23:29+00:00</published>
  <updated>2020-01-06T12:23:29+00:00</updated>
  <author>
    <name>Julien Rouhaud</name>
    <uri>https://rjuju.github.io</uri>
    
  </author>
  <content type="html">
    &lt;p&gt;Coming up with good index suggestion can be a complex task.  It requires
knowledge of both application queries and database specificities.  Over the
year multiple projects tried to solve this problem, one of which being &lt;a href=&quot;https://powa.readthedocs.io/&quot;&gt;PoWA
with the version 3&lt;/a&gt;, with the help of
&lt;a href=&quot;https://powa.readthedocs.io/en/latest/components/stats_extensions/pg_qualstats.html&quot;&gt;pg_qualstats
extension&lt;/a&gt;.
It can give pretty good index suggestion, but it requires to install and
configure PoWA, while some users wanted to only have the global index advisor.
In such case and for simplicity, the algorithm used in PoWA is now available in
pg_qualstats version 2 without requiring any additional component.&lt;/p&gt;

&lt;p&gt;EDIT: The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pg_qualstats_index_advisor()&lt;/code&gt; function has been changed to return
&lt;strong&gt;json&lt;/strong&gt; rather than &lt;strong&gt;jsonb&lt;/strong&gt;, so that the compatibility with PostgreSQL 9.3
is maintained.  The query examples are therefore also modified to use
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;json_array_elements()&lt;/code&gt; rather than &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jsonb_array_elements()&lt;/code&gt;.&lt;/p&gt;

&lt;h3 id=&quot;what-is-pg_qualstats&quot;&gt;What is pg_qualstats&lt;/h3&gt;

&lt;p&gt;A simple way to explain what is pg_qualstats would be to say that it’s like
&lt;a href=&quot;https://www.postgresql.org/docs/current/pgstatstatements.html&quot;&gt;pg_stat_statements&lt;/a&gt;
working at the predicate level.&lt;/p&gt;

&lt;p&gt;The extension will save useful statistics for &lt;strong&gt;WHERE&lt;/strong&gt; and &lt;strong&gt;JOIN&lt;/strong&gt; clauses:
which table and column a predicate refers to, number of time the predicate has
been used, number of execution of the underlying operator, whether it’s a
predicate from an index scan or not, selectivity, constant values used and much
more.&lt;/p&gt;

&lt;p&gt;You can deduce many things from such information.  For instance, if you examine
the predicates that contains references to different tables, you can find which
tables are joined together, and how selective are those join conditions.&lt;/p&gt;

&lt;h3 id=&quot;global-suggestion&quot;&gt;Global suggestion?&lt;/h3&gt;

&lt;p&gt;As I mentioned, the global index advisor added in pg_qualstats 2 uses the same
approach as the one in PoWA, so the explanation here will describe both tools.
The only difference is that with PoWA you’ll likely get a better suggestion, as
more predicates will be available, and you can also choose for wich time
interval you want to detect missing indexes.&lt;/p&gt;

&lt;p&gt;The important thing here is that the suggestion is performed &lt;strong&gt;globally&lt;/strong&gt;,
considering all interesting predicates at the same time.  This approach is
different to all other approaches I saw that only consider a single query at a
time.  I believe that a global approach is better, as it’s possible to reduce
the total number of indexes, maximizing multi-column indexes usefulness.&lt;/p&gt;

&lt;h3 id=&quot;how-global-suggestion-is-done&quot;&gt;How global suggestion is done&lt;/h3&gt;

&lt;p&gt;The first step is to gather all predicates that could benefit from a new index.
This is easy to get with pg_qualstats, by filtering the predicates coming from
sequential scans, executed many time, that filter many rows (both in number of
rows and in percentage) you get a perfect list of predicates that likely miss
an index (or alternatively the list of poorly written queries in certain
cases).  For instance, let’s consider an application which uses those 4
predicates:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/images/global_advisor_1_quals.png&quot;&gt;&lt;img src=&quot;/images/global_advisor_1_quals.png&quot; alt=&quot;List of all predicates
found&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, we build the full set of paths with each AND-ed predicates that contains
other, also possibly AND-ed, predicates.  Using the same 4 predicates, we would
get those paths:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/images/global_advisor_2_graphs.png&quot;&gt;&lt;img src=&quot;/images/global_advisor_2_graphs.png&quot; alt=&quot;Build all possible paths of
predicates&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once all the paths are built, we just need to get the best path to find out the
best index to suggest.  The scoring is for now done by giving a weight to each
node of each path corresponding to the number of simple predicates it contains
and summing the weight for each path.  This is very simple and allows to prefer
a smaller amount of indexes to optimize as many queries as possible.  With our
simple example, we get:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/images/global_advisor_3_weighted.png&quot;&gt;&lt;img src=&quot;/images/global_advisor_3_weighted.png&quot; alt=&quot;Weight all paths and choose the highest
score&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Of course, other scoring approaches could be used to take into account other
parameters and give possibly better suggestions.  For instance, combining the
number of executions or the predicate selectivity.  If the read/write ratio for
each table is known (this is available using
&lt;a href=&quot;https://github.com/powa-team/powa-archivist&quot;&gt;powa-archivist&lt;/a&gt;), it would also
be possible to adapt the scoring method to limit index suggestions for
write-mostly tables.  With this algorithm, all of that could be added quite
easily.&lt;/p&gt;

&lt;p&gt;Once the best path is found, we can generate an index DDL!  As the order of the
columns can be important, this is done using getting the columns for each node
in ascending weight order.  In our example, we would generate this index:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INDEX&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Once an index is found, we simply remove the contained predicates for the
global list of predicates and start again from scratch until there are no
predicate left.&lt;/p&gt;

&lt;h3 id=&quot;additional-details-and-caveat&quot;&gt;Additional details and caveat&lt;/h3&gt;

&lt;p&gt;Of course, this is a simplified version of the suggestion algorithm.  Some
other informations are required.  For instance, the list of predicates is
actually expanded with &lt;a href=&quot;https://www.postgresql.org/docs/current/indexes-opclass.html&quot;&gt;operator classes and access
method&lt;/a&gt; depending
on the column types and operator, to make sure that the suggested indexes are
valid.  If multiple index methods are found for a best path, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;btree&lt;/code&gt; will be
chosen in priority.&lt;/p&gt;

&lt;p&gt;This brings another consideration: this approach is mostly thought for
&lt;strong&gt;btree&lt;/strong&gt; indexes, for which the column order is critical.  Some other access
methods don’t require a specific column order, and for those it could be
possible to get better index suggestions if the column order parameters wasn’t
considered.&lt;/p&gt;

&lt;p&gt;Another important point is that the operator classes and access method is not
hardcoded but retrieved at execution time using the local catalogs.  Therefore,
you can get different (and possibly better) results if you make sure that
optional operator classes are present when using the index advisor.  This could
be &lt;strong&gt;btree_gist&lt;/strong&gt; or &lt;strong&gt;btree_gin&lt;/strong&gt; extensions, but also other access methods.
It’s also possible that some type / operator combination doesn’t have any
associated access method recorded in the catalogs.  In this case, those
predicates are returned separately as a list of unoptimizable predicates, that
should be manually analyzed.&lt;/p&gt;

&lt;p&gt;Finally, as pg_qualstats isn’t considering expression predicates, this advisor
can’t suggest indexes on expression, for instance if you’re using fulltext
search.&lt;/p&gt;

&lt;h3 id=&quot;usage-example&quot;&gt;Usage example&lt;/h3&gt;

&lt;p&gt;A simple set-returning function is provided, with optional parameters, that
returns a json value:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;OR&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;REPLACE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FUNCTION&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_qualstats_index_advisor&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;min_filter&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DEFAULT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;min_selectivity&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DEFAULT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;30&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;forbidden_am&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;text&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DEFAULT&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'{}'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;RETURNS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;json&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;The parameter names are self explanatory:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;min_filter&lt;/code&gt;: how many tuples should a predicate filter on average to be
considered for the global optimization, by default &lt;strong&gt;1000&lt;/strong&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;min_selectivity&lt;/code&gt;: how selective should a predicate filter on average to be
considered for the global optimization, by default &lt;strong&gt;30%&lt;/strong&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;forbidden_am&lt;/code&gt;: list of access methods to ignore.  None by default,
although for PostgreSQL 9.6 and prior &lt;strong&gt;hash indexes will internally be
discarded&lt;/strong&gt;, as those are only safe since version 10.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Using pg_qualstats regression tests, let’s see a simple example:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pgqs&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'a'&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;generate_series&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;adv&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id1&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id2&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id3&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;text&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;adv&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'line '&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;||&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;generate_series&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_qualstats_reset&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;adv&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;adv&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;500&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;adv&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'meh'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;adv&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'meh'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;adv&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'meh'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;adv&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'meh'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;adv&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id3&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'meh'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;adv&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ILIKE&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'moh'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;COUNT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pgqs&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;And here’s what the function returns:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;json_array_elements&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;pg_qualstats_index_advisor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;min_filter&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;50&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;'indexes'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;text&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;COLLATE&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;&quot;C&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
                               &lt;span class=&quot;n&quot;&gt;v&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;---------------------------------------------------------------&lt;/span&gt;
 &lt;span class=&quot;nv&quot;&gt;&quot;CREATE INDEX ON public.adv USING btree (id1)&quot;&lt;/span&gt;
 &lt;span class=&quot;nv&quot;&gt;&quot;CREATE INDEX ON public.adv USING btree (val, id1, id2, id3)&quot;&lt;/span&gt;
 &lt;span class=&quot;nv&quot;&gt;&quot;CREATE INDEX ON public.pgqs USING btree (id)&quot;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;json_array_elements&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;pg_qualstats_index_advisor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;min_filter&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;50&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;'unoptimised'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;v&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;text&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;COLLATE&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;&quot;C&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;v&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-----------------&lt;/span&gt;
 &lt;span class=&quot;nv&quot;&gt;&quot;adv.val ~~* ?&quot;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;row&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;The &lt;a href=&quot;https://github.com/powa-team/pg_qualstats/&quot;&gt;version 2 of pg_qualstats&lt;/a&gt; is
not released yet, but feel free to test it and &lt;a href=&quot;https://github.com/powa-team/pg_qualstats/issues&quot;&gt;report any issue you may
find&lt;/a&gt;!&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://rjuju.github.io/postgresql/2020/01/06/pg_qualstats-2-global-index-advisor.html&quot;&gt;pg qualstats 2: Global index advisor&lt;/a&gt; was originally published by Julien Rouhaud at &lt;a href=&quot;https://rjuju.github.io&quot;&gt;rjuju's home&lt;/a&gt; on January 06, 2020.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[PoWA 4: New powa-collector daemon]]></title>
  <link rel="alternate" type="text/html" href="https://rjuju.github.io/postgresql/2019/12/10/powa-4-new-powa-collector.html" />
  <id>https://rjuju.github.io/postgresql/2019/12/10/powa-4-new-powa-collector</id>
  <published>2019-12-10T18:54:17+00:00</published>
  <updated>2019-12-10T18:54:17+00:00</updated>
  <author>
    <name>Julien Rouhaud</name>
    <uri>https://rjuju.github.io</uri>
    
  </author>
  <content type="html">
    &lt;p&gt;This article is part of the &lt;a href=&quot;http://powa.readthedocs.io/&quot;&gt;PoWA 4 beta&lt;/a&gt; series,
and describes the new &lt;a href=&quot;https://powa.readthedocs.io/en/latest/components/powa-collector/index.html&quot;&gt;powa-collector
daemon&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;new-powa-collector-daemon&quot;&gt;New &lt;a href=&quot;https://powa.readthedocs.io/en/latest/components/powa-collector/index.html&quot;&gt;powa-collector daemon&lt;/a&gt;&lt;/h3&gt;

&lt;p&gt;This daemon replaces the previous &lt;em&gt;background worker&lt;/em&gt; when using the &lt;a href=&quot;https://powa.readthedocs.io/en/latest/remote_setup.html&quot;&gt;new
remote mode&lt;/a&gt;.  It’s a
simple daemon written in python, which will perform all the required steps to
perform &lt;em&gt;remote snapshots&lt;/em&gt;.  It’s &lt;a href=&quot;https://pypi.org/project/powa-collector/&quot;&gt;available on
pypi&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;As I explained in my &lt;a href=&quot;/postgresql/2019/05/17/powa-4-with-remote-mode-beta-is-available.html&quot;&gt;previous article introducing PoWA 4&lt;/a&gt;, this daemon is
required for a remote mode setup, with this architecture in mind:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/images/powa_4_remote.svg&quot;&gt;&lt;img src=&quot;/images/powa_4_remote.svg&quot; alt=&quot;PoWA 4 remote architecture&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Its configuration is very simple.  All you need to do is copy and rename the
provided &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;powa-collector.conf.sample&lt;/code&gt; file, and adapt the &lt;a href=&quot;https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNSTRING&quot;&gt;connection
URI&lt;/a&gt;
to describe how to connect on your dedicated &lt;em&gt;repository server&lt;/em&gt;, and you’re
done.&lt;/p&gt;

&lt;p&gt;A typical configuration will look like:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-conf&quot; data-lang=&quot;conf&quot;&gt;{
    &lt;span class=&quot;s2&quot;&gt;&quot;repository&quot;&lt;/span&gt;: {
        &lt;span class=&quot;s2&quot;&gt;&quot;dsn&quot;&lt;/span&gt;: &lt;span class=&quot;s2&quot;&gt;&quot;postgresql://powa_user@server_dns:5432/powa&quot;&lt;/span&gt;,
    },
    &lt;span class=&quot;s2&quot;&gt;&quot;debug&quot;&lt;/span&gt;: &lt;span class=&quot;n&quot;&gt;true&lt;/span&gt;
}&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;The list of &lt;em&gt;remote servers&lt;/em&gt;, their configuration and everything else it needs
will be automatically retrieved from the &lt;em&gt;repository server&lt;/em&gt; you just
configured.  When started, it’ll spawn one dedicated thread per declared
&lt;em&gt;remote server&lt;/em&gt;, and maintain a &lt;strong&gt;persistent connection&lt;/strong&gt; on the configured
&lt;strong&gt;powa database&lt;/strong&gt; on this &lt;em&gt;remote server&lt;/em&gt;.  Each thread will perform a &lt;em&gt;remote
snapshot&lt;/em&gt;, exporting the data on the &lt;em&gt;repository server&lt;/em&gt; using the new &lt;em&gt;source
functions&lt;/em&gt;.  Each thread will open and close a connection on the &lt;em&gt;repository
server&lt;/em&gt; when performing the &lt;em&gt;remote snapshot&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;This daemon obviously needs to be able to connect to all the declared &lt;em&gt;remote
servers&lt;/em&gt; and the &lt;em&gt;repository server&lt;/em&gt;.  The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;powa_servers&lt;/code&gt; table, which store
the list of &lt;em&gt;remote servers&lt;/em&gt;,  has a field to store username and password to
connect to the &lt;em&gt;remote server&lt;/em&gt;.  Storing a password in plain text in this table
is an heresy as far as security is concerned.  So, as mentioned in the
&lt;a href=&quot;https://powa.readthedocs.io/en/latest/security.html#connection-on-remote-servers&quot;&gt;PoWA security
documentation&lt;/a&gt;,
you can store a NULL password and &lt;a href=&quot;https://www.postgresql.org/docs/current/auth-methods.html&quot;&gt;instead use any of the authentication method
that libpq supports&lt;/a&gt;
(.pgpass file, certificate…).  That’s strongly recommended for any non toy
setup.&lt;/p&gt;

&lt;p&gt;The persistent connection on the &lt;em&gt;repository server&lt;/em&gt; is used to monitor the
daemon:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;to check that the daemon is up and running&lt;/li&gt;
  &lt;li&gt;to communicate through the UI using a &lt;a href=&quot;https://powa.readthedocs.io/en/latest/components/powa-collector/protocol.html&quot;&gt;simple protocol&lt;/a&gt;
to perform various actions (reload the configuration, check for a &lt;em&gt;remote
server&lt;/em&gt; thread status…)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Note that you can also ask the daemon to reload its configuration by issuing a
SIGHUP to the daemon process.  A reload is required if any modification to the
list of remote servers (if you added or removed a &lt;em&gt;remote server&lt;/em&gt;, or
updated a setting for an existing) has been done.&lt;/p&gt;

&lt;p&gt;Also note that by choice,
&lt;a href=&quot;https://powa.readthedocs.io/en/latest/components/powa-collector/index.html&quot;&gt;powa-collector&lt;/a&gt;
will not perform &lt;em&gt;local snapshots&lt;/em&gt;.  If you want to use PoWA for the
&lt;em&gt;repository server&lt;/em&gt;, you need to enable the original &lt;em&gt;background worker&lt;/em&gt;.&lt;/p&gt;

&lt;h5 id=&quot;new-configuration-page&quot;&gt;New configuration page&lt;/h5&gt;

&lt;p&gt;The configuration page is now updated to give all needed information about the
background worker status and the &lt;a href=&quot;https://powa.readthedocs.io/en/latest/components/powa-collector/index.html&quot;&gt;powa-collector
daemon&lt;/a&gt;
status (including all of its dedicated threads) and the list of registered
&lt;em&gt;remote servers&lt;/em&gt;.  Here’s an example of the new root configuration page:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/images/powa_4_configuration_page.png&quot;&gt;&lt;img src=&quot;/images/powa_4_configuration_page.png&quot; alt=&quot;New configuration
page&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If the &lt;a href=&quot;https://powa.readthedocs.io/en/latest/components/powa-collector/index.html&quot;&gt;powa-collector
daemon&lt;/a&gt;
is used, each remote server status will be retrieved using the communication
protocol.  If the collector encountered any error (connecting to a &lt;em&gt;remote
server&lt;/em&gt;, during a &lt;em&gt;snapshot&lt;/em&gt; or anything else), they’ll also be displayed here.
Also note that such errors will also be displayed on top of any page of the UI,
so that you can’t miss them.&lt;/p&gt;

&lt;p&gt;Also, the configuration section has now a hierarchy, and you’ll be able to see
the list of extensions and the current PostgreSQL configuration for the
&lt;strong&gt;local&lt;/strong&gt; or &lt;strong&gt;remote servers&lt;/strong&gt; by clicking on the server of your choice!&lt;/p&gt;

&lt;p&gt;There’s also a new &lt;strong&gt;Reload collector&lt;/strong&gt; button on the header panel, which as
expected will ask the collector to reload its configuration.  That can be
useful if you registered new servers and you don’t have access on the server
where the collector is running.&lt;/p&gt;

&lt;h3 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;This is the last article introducing the new version of PoWA.  It’s still in
beta, so feel free to test it, &lt;a href=&quot;https://powa.readthedocs.io/en/latest/support.html#support&quot;&gt;report any issue you may
find&lt;/a&gt; or give any
other feedback!&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://rjuju.github.io/postgresql/2019/12/10/powa-4-new-powa-collector.html&quot;&gt;PoWA 4: New powa-collector daemon&lt;/a&gt; was originally published by Julien Rouhaud at &lt;a href=&quot;https://rjuju.github.io&quot;&gt;rjuju's home&lt;/a&gt; on December 10, 2019.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[PoWA 4: changes in powa-archivist!]]></title>
  <link rel="alternate" type="text/html" href="https://rjuju.github.io/postgresql/2019/06/05/powa-4-new-in-powa-archivist.html" />
  <id>https://rjuju.github.io/postgresql/2019/06/05/powa-4-new-in-powa-archivist</id>
  <published>2019-06-05T14:26:17+00:00</published>
  <updated>2019-06-05T14:26:17+00:00</updated>
  <author>
    <name>Julien Rouhaud</name>
    <uri>https://rjuju.github.io</uri>
    
  </author>
  <content type="html">
    &lt;p&gt;This article is part of the &lt;a href=&quot;http://powa.readthedocs.io/&quot;&gt;PoWA 4 beta&lt;/a&gt; series,
and describes the changes done in
&lt;a href=&quot;https://powa.readthedocs.io/en/latest/components/powa-archivist/index.html&quot;&gt;powa-archivist&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For more information about this v4, you can consult the &lt;a href=&quot;/postgresql/2019/05/17/powa-4-with-remote-mode-beta-is-available.html&quot;&gt;general introduction
article&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;quick-overview&quot;&gt;Quick overview&lt;/h3&gt;

&lt;p&gt;First of all, you have to know that there is not upgrade possible from v3 to
v4, so a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DROP EXTENSION powa&lt;/code&gt; is required if you were already using PoWA on
any of your servers.  This is because this v4 involved &lt;strong&gt;a lot&lt;/strong&gt; of changes in
the SQL part of the extension, making it the most significant change in the
PoWA suite for this new version.  Looking at the amount changes at the time I’m
writing this article, I get:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-diff&quot; data-lang=&quot;diff&quot;&gt; CHANGELOG.md       |   14 +
 powa--4.0.0dev.sql | 2075 +++++++++++++++++++++-------
 powa.c             |   44 +-
 3 files changed, 1629 insertions(+), 504 deletions(-)&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;The lack of upgrade shouldn’t be a problem in practice though.  PoWA is a
performance tool, so it’s intended to have data with high precision but with a
very limited history.  If you’re looking for a general monitoring solution
keeping months of counters, PoWA is definitely not the tool you need.&lt;/p&gt;

&lt;h3 id=&quot;configuring-the-list-of-remote-servers&quot;&gt;Configuring the list of &lt;em&gt;remote servers&lt;/em&gt;&lt;/h3&gt;

&lt;p&gt;Concerning the features themselves, the first small change is that
powa-archivist does not require the &lt;a href=&quot;https://www.postgresql.org/docs/current/bgworker.html&quot;&gt;background
worker&lt;/a&gt; to be active
anymore, as it won’t be used for remote setup.  That means that a PostgreSQL
restart is not needed needed anymore to install PoWA.  Obviously, a restart is still
required if you want to use the local setup, using the background worker, or if
you want to install additional extensions that themselves require a restart.&lt;/p&gt;

&lt;p&gt;Then, as PoWA needs some configuration (frequency of snapshot, data retention
and so on), some new tables are added to be able to configure all of that.  The
new &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;powa_servers&lt;/code&gt; table stores the configuration for all the remote instances
whose data should be stored on this instance.  This &lt;em&gt;local PoWA instance&lt;/em&gt; is
call a &lt;strong&gt;repository server&lt;/strong&gt; (that typically should be dedicated to storing
PoWA data), in opposition to &lt;strong&gt;remote instances&lt;/strong&gt; which are the instances you
want to monitor.  The content of this table is pretty straightforward:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;err&quot;&gt;\&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;d&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;powa_servers&lt;/span&gt;
                              &lt;span class=&quot;k&quot;&gt;Table&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;&quot;public.powa_servers&quot;&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;Column&lt;/span&gt;   &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;   &lt;span class=&quot;k&quot;&gt;Type&lt;/span&gt;   &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Collation&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Nullable&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;                 &lt;span class=&quot;k&quot;&gt;Default&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-----------+----------+-----------+----------+------------------------------------------&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;            &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nextval&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;'powa_servers_id_seq'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;regclass&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;hostname&lt;/span&gt;      &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;text&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;alias&lt;/span&gt;         &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;text&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;          &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;port&lt;/span&gt;          &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;username&lt;/span&gt;      &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;text&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;password&lt;/span&gt;      &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;text&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;          &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;dbname&lt;/span&gt;        &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;text&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;frequency&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;300&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;powa_coalesce&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;retention&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;interval&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'1 day'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;interval&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;If you already used PoWA, you should recognize most of the configuration
options, that are now stored here.  The new options are used to describe how to
connect to the &lt;em&gt;remote servers&lt;/em&gt;, and can provide an alias to be displayed in
the UI.&lt;/p&gt;

&lt;p&gt;You also probably noticed a &lt;strong&gt;password&lt;/strong&gt; column here.   Storing a password in
plain text in this table is an heresy as far as security is concerned.  So, as
mentioned in the &lt;a href=&quot;https://powa.readthedocs.io/en/latest/security.html#connection-on-remote-servers&quot;&gt;PoWA security section of the
documentation&lt;/a&gt;,
you can store a NULL password and use instead &lt;a href=&quot;https://www.postgresql.org/docs/current/auth-methods.html&quot;&gt;any of the authentication method
that libpq supports&lt;/a&gt;
(.pgpass file, certificate…).  That’s strongly recommended for any non toy
setup.&lt;/p&gt;

&lt;p&gt;Another table, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;powa_snapshot_metas&lt;/code&gt; table, is also added to store some
metadata regarding each &lt;em&gt;remote server&lt;/em&gt; snapshot information:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;                                   &lt;span class=&quot;k&quot;&gt;Table&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;&quot;public.powa_snapshot_metas&quot;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;Column&lt;/span&gt;    &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;k&quot;&gt;Type&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Collation&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Nullable&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;                &lt;span class=&quot;k&quot;&gt;Default&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;--------------+--------------------------+-----------+----------+---------------------------------------&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;srvid&lt;/span&gt;        &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;                  &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;coalesce_seq&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;bigint&lt;/span&gt;                   &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;snapts&lt;/span&gt;       &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;timestamp&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;time&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;zone&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'-infinity'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;timestamp&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;time&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;zone&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;aggts&lt;/span&gt;        &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;timestamp&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;time&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;zone&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'-infinity'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;timestamp&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;time&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;zone&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;purgets&lt;/span&gt;      &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;timestamp&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;time&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;zone&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'-infinity'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;timestamp&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;time&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;zone&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;errors&lt;/span&gt;       &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;text&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;That’s basically a counter to track the number of snapshots done, the timestamp
for each kind of event that happened (snapshot, aggregate and purge), and a
text array to store any error happening during the snapshot, that the UI can
display.&lt;/p&gt;

&lt;h3 id=&quot;sql-api-to-configure-the-remote-servers&quot;&gt;SQL API to configure the &lt;em&gt;remote servers&lt;/em&gt;&lt;/h3&gt;

&lt;p&gt;While thoses table are simple, a &lt;a href=&quot;https://powa.readthedocs.io/en/latest/remote_setup.html#configure-powa-and-stats-extensions-on-each-remote-server&quot;&gt;basic SQL API is available to register new
servers and configure
them&lt;/a&gt;.
Basically, 6 functions are available:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;powa_register_server()&lt;/code&gt;, to declare a new &lt;em&gt;remote server&lt;/em&gt;, and the list of
extensions available on it&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;powa_configure_server()&lt;/code&gt; to update any setting for the specified &lt;em&gt;remote
server&lt;/em&gt; (using a JSON where the key is the name of the parameter to change,
and the value is the new value to use)&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;powa_deactivate_server()&lt;/code&gt; to disable snapshots on the specified &lt;em&gt;remote
server&lt;/em&gt; (which actually is setting up the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;frequency&lt;/code&gt; to &lt;strong&gt;-1&lt;/strong&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;powa_delete_and_purge_server()&lt;/code&gt; to remove the specified &lt;em&gt;remote server&lt;/em&gt;
from the list of servers and remove all associated snapshot data&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;powa_activate_extension()&lt;/code&gt;, to declare that a new extension is available
on the specified &lt;em&gt;remote server&lt;/em&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;powa_deactivate_extension()&lt;/code&gt;, to specify that an extension is not available
anymore on the specified &lt;em&gt;remote server&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Any action more complicated than this should be performed using plain SQL
queries.  Hopefully, there shouldn’t be many other needs, and the tables are
straightforward so this shouldn’t be a problem.  &lt;a href=&quot;https://github.com/powa-team/powa-archivist/issues&quot;&gt;Feel free to ask for more
functions&lt;/a&gt; if you feel the
need though.  Please also note that the UI doesn’t allow you to call those
functions, as the UI is for now entirely &lt;strong&gt;read only&lt;/strong&gt;.&lt;/p&gt;

&lt;h3 id=&quot;performing-remote-snapshots&quot;&gt;Performing &lt;em&gt;remote snapshots&lt;/em&gt;&lt;/h3&gt;

&lt;p&gt;As metrics are now stored on a different PostgreSQL instance, we had to
extensively change the way &lt;em&gt;snapshots&lt;/em&gt; (retrieving the data from a &lt;a href=&quot;https://powa.readthedocs.io/en/latest/components/stats_extensions/index.html&quot;&gt;stat
extension&lt;/a&gt;
and storing them in PoWA catalog &lt;a href=&quot;/postgresql/2016/09/16/minimizing-tuple-overhead.html&quot;&gt;in a space efficient way&lt;/a&gt;) are performed.&lt;/p&gt;

&lt;p&gt;The list of all stat extensions, or &lt;em&gt;data sources&lt;/em&gt;, that are available on a
&lt;strong&gt;server&lt;/strong&gt; (either &lt;em&gt;remote&lt;/em&gt; or &lt;em&gt;local&lt;/em&gt;) and for which we should perform a
&lt;em&gt;snapshot&lt;/em&gt; are configured in a table called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;powa_functions&lt;/code&gt;:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;               &lt;span class=&quot;k&quot;&gt;Table&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;&quot;public.powa_functions&quot;&lt;/span&gt;
     &lt;span class=&quot;k&quot;&gt;Column&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;k&quot;&gt;Type&lt;/span&gt;   &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Collation&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Nullable&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Default&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;----------------+---------+-----------+----------+---------&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;srvid&lt;/span&gt;          &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;module&lt;/span&gt;         &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;text&lt;/span&gt;    &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;operation&lt;/span&gt;      &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;text&lt;/span&gt;    &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;function_name&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;text&lt;/span&gt;    &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;query_source&lt;/span&gt;   &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;text&lt;/span&gt;    &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;          &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;added_manually&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;boolean&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;true&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;enabled&lt;/span&gt;        &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;boolean&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;true&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;priority&lt;/span&gt;       &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;numeric&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;A new &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query_source&lt;/code&gt; field is added, that provides the name of a &lt;em&gt;source&lt;/em&gt;
function, required to  support remote snapshot of any &lt;a href=&quot;https://powa.readthedocs.io/en/latest/components/stats_extensions/index.html&quot;&gt;stat
extensions&lt;/a&gt;.
This function is used to export the counters provided by this extension on a
different server, in a dedicated &lt;em&gt;transient table&lt;/em&gt;.  The &lt;em&gt;snapshot&lt;/em&gt; function
will then perform the &lt;em&gt;snapshot&lt;/em&gt; using those exported data instead of the one
provided by stat extensions locally when the remote mode is used.  Note that
the counters export and the remote snapshot is done automatically with the the
new &lt;a href=&quot;https://powa.readthedocs.io/en/latest/components/powa-collector/index.html&quot;&gt;powa-collector
daemon&lt;/a&gt;,
that I’ll cover in another article.&lt;/p&gt;

&lt;p&gt;Here’s an example of how PoWA perform a &lt;em&gt;remote snapshot&lt;/em&gt; of the list of
databases.  As you’ll see, this is very simplistic, meaning that it’s very easy
to add support for a new stat extension.&lt;/p&gt;

&lt;p&gt;The &lt;em&gt;transient table&lt;/em&gt;:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;   &lt;span class=&quot;n&quot;&gt;Unlogged&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;table&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;&quot;public.powa_databases_src_tmp&quot;&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;Column&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;k&quot;&gt;Type&lt;/span&gt;   &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Collation&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Nullable&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Default&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;---------+---------+-----------+----------+---------&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;srvid&lt;/span&gt;   &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;oid&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;oid&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;datname&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;    &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;For better performance, all the &lt;em&gt;transient tables&lt;/em&gt; are &lt;strong&gt;unlogged&lt;/strong&gt;, as their
content is only needed during a &lt;em&gt;snapshot&lt;/em&gt; and are trashed afterwards.  In this
example the &lt;em&gt;transient table&lt;/em&gt; only stores the server identifier for which the
data are, the oid and name of each databases present on the &lt;em&gt;remote server&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;And the &lt;em&gt;source function&lt;/em&gt;:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;OR&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;REPLACE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FUNCTION&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;public&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;powa_databases_src&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_srvid&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;OUT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;oid&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;oid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;OUT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;datname&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;RETURNS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SETOF&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;record&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;LANGUAGE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;plpgsql&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;$&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;BEGIN&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;IF&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_srvid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;THEN&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;RETURN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;QUERY&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;oid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;datname&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_database&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;ELSE&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;RETURN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;QUERY&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;oid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;datname&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;powa_databases_src_tmp&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;srvid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_srvid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;END&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IF&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;$&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;This function simply returns the content of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pg_database&lt;/code&gt; if local data are
asked (server id &lt;strong&gt;0&lt;/strong&gt; is always the local server), or the content of the
&lt;em&gt;transient table&lt;/em&gt; for the given remote server otherwise.&lt;/p&gt;

&lt;p&gt;The &lt;em&gt;snapshot function&lt;/em&gt; can then easily do any required work with the data
for the wanted &lt;em&gt;remote server&lt;/em&gt;.  In the case of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;powa_databases_snapshot()&lt;/code&gt;
function, the just synchronizing the list of databases, and storing the
timestamp of removal if a previously existing database is not found anymore.&lt;/p&gt;

&lt;p&gt;For more details, you can consult the &lt;a href=&quot;https://powa.readthedocs.io/en/latest/components/powa-archivist/development.html&quot;&gt;PoWA datasource
integration&lt;/a&gt;
documentation, which was updated for the version 4 specificities.&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://rjuju.github.io/postgresql/2019/06/05/powa-4-new-in-powa-archivist.html&quot;&gt;PoWA 4: changes in powa-archivist!&lt;/a&gt; was originally published by Julien Rouhaud at &lt;a href=&quot;https://rjuju.github.io&quot;&gt;rjuju's home&lt;/a&gt; on June 05, 2019.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[PoWA 4 brings a remote mode, available in beta!]]></title>
  <link rel="alternate" type="text/html" href="https://rjuju.github.io/postgresql/2019/05/17/powa-4-with-remote-mode-beta-is-available.html" />
  <id>https://rjuju.github.io/postgresql/2019/05/17/powa-4-with-remote-mode-beta-is-available</id>
  <published>2019-05-17T11:04:17+00:00</published>
  <updated>2019-05-17T11:04:17+00:00</updated>
  <author>
    <name>Julien Rouhaud</name>
    <uri>https://rjuju.github.io</uri>
    
  </author>
  <content type="html">
    &lt;p&gt;&lt;a href=&quot;http://powa.readthedocs.io/&quot;&gt;PoWA 4&lt;/a&gt; is available in beta.&lt;/p&gt;

&lt;h3 id=&quot;new-remote-mode&quot;&gt;New remote mode!&lt;/h3&gt;

&lt;p&gt;The &lt;a href=&quot;https://powa.readthedocs.io/en/latest/remote_setup.html&quot;&gt;new remote mode&lt;/a&gt;
is the biggest feature introduced in PoWA 4, though there have been other
improvements.&lt;/p&gt;

&lt;p&gt;I’ll describe here what this new mode implies and what changed in the
&lt;a href=&quot;https://powa.readthedocs.io/en/latest/components/powa-web/index.html&quot;&gt;UI&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you’re interested in more details about the rest of the changes in PoWA 4,
I’ll soon publish other articles for that.&lt;/p&gt;

&lt;p&gt;For the most hurried people, feel free to directly go on the &lt;a href=&quot;https://dev-powa.anayrat.info/&quot;&gt;v4 demo of
PoWA&lt;/a&gt;, kindly hosted by &lt;a href=&quot;http://blog.anayrat.info/&quot;&gt;Adrien
Nayrat&lt;/a&gt;.  No credential needed, just click on
“Login”.&lt;/p&gt;

&lt;h3 id=&quot;why-is-a-remote-mode-important&quot;&gt;Why is a remote mode important&lt;/h3&gt;

&lt;p&gt;This feature has probably been the most frequently asked since PoWA was first
released, back in 2014.  And that was asked for good reasons, as a local mode
have some drawbacks.&lt;/p&gt;

&lt;p&gt;First, let’s see how was the architecture up to PoWA 3.  Assuming an instance
with 2 databases (db1 and db2), plus &lt;strong&gt;one database dedicated for PoWA&lt;/strong&gt;.  This
dedicated database contains both the &lt;em&gt;stat extension&lt;/em&gt; required to get the
live performance data and to &lt;strong&gt;store them&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/images/powa_4_local.svg&quot;&gt;&lt;img src=&quot;/images/powa_4_local.svg&quot; alt=&quot;Local mode architecture&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A custom &lt;em&gt;&lt;a href=&quot;https://powa.readthedocs.io/en/latest/components/powa-archivist/configuration.html#background-worker-configuration&quot;&gt;background
worker&lt;/a&gt;&lt;/em&gt;
is started by PoWA, which is responsible for taking snapshots and storing them
in the dediacted powa database regularly.  Then, using powa-web, you can see the
activity of any of the &lt;strong&gt;local&lt;/strong&gt; databases querying the stored data on the
dedicated database, and possibly connect to one of the other local database
when complete data are needed, for instance when using the index suggestion
tool.&lt;/p&gt;

&lt;p&gt;With version 4, the architecture with a remote setup change quite a lot:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/images/powa_4_remote.svg&quot;&gt;&lt;img src=&quot;/images/powa_4_remote.svg&quot; alt=&quot;Remote mode architecture&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can see the a dedicated powa database is still required, but &lt;strong&gt;only for the
stat extensions&lt;/strong&gt;.  Data are now stored on a different instance.  Then, the
&lt;em&gt;&lt;a href=&quot;https://powa.readthedocs.io/en/latest/components/powa-archivist/configuration.html#background-worker-configuration&quot;&gt;background
worker&lt;/a&gt;&lt;/em&gt;
is replaced by a &lt;strong&gt;&lt;a href=&quot;https://powa.readthedocs.io/en/latest/components/powa-collector/index.html&quot;&gt;new collector
daemon&lt;/a&gt;&lt;/strong&gt;,
which reads the performance data from the &lt;em&gt;remote servers&lt;/em&gt;, and store them on the
dedicated &lt;em&gt;repository server&lt;/em&gt;.  Powa-web will then be able to display the
activity connecting on the &lt;em&gt;repository server&lt;/em&gt;, and also on the &lt;strong&gt;remote
server&lt;/strong&gt; when complete data are needed.&lt;/p&gt;

&lt;p&gt;In short, with the new remote mode introduced in this version 4:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;a PostgreSQL restart is not required anymore to install powa-archivist
extension, as the background worker is not mandatory anymore&lt;/li&gt;
  &lt;li&gt;there is no overhead due to storing and querying data on the same
PostgreSQL server as your production server (there are still some part of
the UI that requires querying the original server, for instance when
showing EXPLAIN plans, but that’s a negligible overhead)&lt;/li&gt;
  &lt;li&gt;it’s now possible to use PoWA on a &lt;strong&gt;hot-standby server&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The UI will therefore now welcome you with a initial page to let you chose
which server stored on the configured database you want to wotk on:
&lt;a href=&quot;/images/powa_4_all_servers.png&quot;&gt;&lt;img src=&quot;/images/powa_4_all_servers.png&quot; alt=&quot;Servers choice&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The main reason it took so much time to bring a remote mode is because this
adds quite some complexity, requiring a major rewrite of the whole PoWA stack.
We also wanted to add more feature first, such as the &lt;strong&gt;global index
suggestion&lt;/strong&gt;, with &lt;strong&gt;validation using &lt;a href=&quot;http://hypopg.readthedocs.io/&quot;&gt;hypopg&lt;/a&gt;&lt;/strong&gt;
introduced with &lt;a href=&quot;https://powa.readthedocs.io/en/latest/releases/v3.0.0.html&quot;&gt;PoWA
3&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;changes-in-powa-web&quot;&gt;Changes in &lt;a href=&quot;https://powa.readthedocs.io/en/latest/components/powa-web/index.html&quot;&gt;powa-web&lt;/a&gt;&lt;/h3&gt;

&lt;p&gt;The &lt;em&gt;user interface&lt;/em&gt; is the component which probably has the most visible
changes in this version 4.  Here are the most important ones.&lt;/p&gt;

&lt;h5 id=&quot;remote-mode-compatibility&quot;&gt;Remote mode compatibility&lt;/h5&gt;

&lt;p&gt;The biggest change is obviously the support for the &lt;a href=&quot;https://powa.readthedocs.io/en/latest/remote_setup.html&quot;&gt;new remote
mode&lt;/a&gt;.  As a
consequence, the first page shown is now a &lt;strong&gt;server selector&lt;/strong&gt; page, displaying
all registered &lt;em&gt;remote servers&lt;/em&gt;.  After choosing the wanted &lt;em&gt;remote server&lt;/em&gt; (or
&lt;em&gt;local server&lt;/em&gt; if you don’t use the remote mode), all other pages will be
similar to the one that were available until PoWA 3, but displaying data for a
specific &lt;em&gt;remote server&lt;/em&gt; only, and of course retrieving the data from the
&lt;strong&gt;repository powa database&lt;/strong&gt;, and with some new information I’ll describe just
after.&lt;/p&gt;

&lt;p&gt;Note that as the data is now stored on a dedicated &lt;em&gt;repository server&lt;/em&gt; when
using the remote mode, most of the UI is usable without connecting on the
currently selected &lt;em&gt;remote server&lt;/em&gt;.  However, powa-web still requires to
connect on the &lt;em&gt;remote server&lt;/em&gt; when the original data are needed (for instance,
for index suggestion or when showing &lt;strong&gt;EXPLAIN&lt;/strong&gt; plans).  The &lt;a href=&quot;https://powa.readthedocs.io/en/latest/security.html#connection-on-remote-servers&quot;&gt;same
authentication considerations and
possibilities&lt;/a&gt;
as for the new &lt;a href=&quot;https://powa.readthedocs.io/en/latest/components/powa-collector/index.html&quot;&gt;powa-collector
daemon&lt;/a&gt;
(which will be described in a following article) applies here.&lt;/p&gt;

&lt;h5 id=&quot;pg_track_settings-support&quot;&gt;&lt;a href=&quot;https://github.com/rjuju/pg_track_settings/&quot;&gt;pg_track_settings&lt;/a&gt; support&lt;/h5&gt;

&lt;p&gt;When this extension is properly configured, a new timeline widget will appear,
placed between each graph and its overview, displaying any kind of recorded
change if any was detected in the currently selected time interval.  On the
per-database and per-query pages, this list will be filtered by the selected
database.&lt;/p&gt;

&lt;p&gt;The same timeline will be displayed on every graph of each page, to easily
check if this change had any visible impact using the various graphs.&lt;/p&gt;

&lt;p&gt;Note that details of the changes will be displayed on mouseover. You can also
click on any event on the timeline to make the event stay displayed, and draw a
vertical line on the underlying graph.&lt;/p&gt;

&lt;p&gt;Here’s an example of such detected configuration change in action:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/images/pg_track_settings_powa4.png&quot;&gt;&lt;img src=&quot;/images/pg_track_settings_powa4.png&quot; alt=&quot;Configuration changes detected&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Please also note that you need at least version 2.0.0 of
&lt;a href=&quot;https://github.com/rjuju/pg_track_settings/&quot;&gt;pg_track_settings&lt;/a&gt;, and that the
extension has to be installed &lt;strong&gt;both on the &lt;em&gt;remote servers&lt;/em&gt; and the
&lt;em&gt;repository server&lt;/em&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;h5 id=&quot;new-graphs-available&quot;&gt;New graphs available&lt;/h5&gt;

&lt;p&gt;When
&lt;a href=&quot;https://powa.readthedocs.io/en/latest/components/stats_extensions/pg_stat_kcache.html&quot;&gt;pg_stat_kcache&lt;/a&gt;
is setup, its information were previously only displayed on the per-query page.
They’re now displayed on per-server and per-database too, in two graphs:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;in the &lt;strong&gt;Block Access&lt;/strong&gt; graph, where the &lt;strong&gt;OS cache&lt;/strong&gt; and &lt;strong&gt;disk read&lt;/strong&gt;
metrics will replace the &lt;strong&gt;read&lt;/strong&gt; metric&lt;/li&gt;
  &lt;li&gt;in a new &lt;strong&gt;System Resources&lt;/strong&gt; graph (which is also added in the &lt;em&gt;per-query&lt;/em&gt;
page), showing the &lt;a href=&quot;/postgresql/2018/07/17/pg_stat_kcache-2-1-is-out.html&quot;&gt;metrics added in pg_stat_kcache 2.1&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here is an example of this new &lt;strong&gt;System Resources&lt;/strong&gt; graph:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/images/pg_stat_kcache_system_resources_powa4.png&quot;&gt;&lt;img src=&quot;/images/pg_stat_kcache_system_resources_powa4.png&quot; alt=&quot;System ressources&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There was also a &lt;strong&gt;Wait Events&lt;/strong&gt; graph (available when &lt;a href=&quot;https://powa.readthedocs.io/en/v4/components/stats_extensions/pg_wait_sampling.html&quot;&gt;pg_wait_sampling
extension&lt;/a&gt;
is setup) only available on the per-query page.  This graph is now available on
the per-server and per-database pages too.&lt;/p&gt;

&lt;h5 id=&quot;metrics-documentation-and-documentation-link&quot;&gt;Metrics documentation and documentation link&lt;/h5&gt;

&lt;p&gt;Some metrics displayed in the user interface was quite self explanatory, while
some could be a little bit obscure.  Unfortunately, until now there wasn’t any
documentation for any of the metrics.  That’s now fixed, and all graphs have an
&lt;em&gt;information icon&lt;/em&gt;, that will display a description of the metrics used in the
graph on mouseover.  Some graphs will also include a link to the underlying
&lt;a href=&quot;https://powa.readthedocs.io/en/latest/components/stats_extensions/index.html&quot;&gt;stat extension in PoWA
documentation&lt;/a&gt;
for users who want to learn more about them.&lt;/p&gt;

&lt;p&gt;Here’s an example:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/images/powa_4_metrics_doc.png&quot;&gt;&lt;img src=&quot;/images/powa_4_metrics_doc.png&quot; alt=&quot;Metrics documentation&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h5 id=&quot;and-general-bugfixes&quot;&gt;And general bugfixes&lt;/h5&gt;

&lt;p&gt;Some longstanding issues were also reported:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;the graph hover box showing metric values had a wrong vertical position&lt;/li&gt;
  &lt;li&gt;the time selection using the graph preview didn’t show a correct preview
after applying the selection&lt;/li&gt;
  &lt;li&gt;errors on hypothetical index creation or in certain cases their display
wasn’t correctly handled in multiple pages&lt;/li&gt;
  &lt;li&gt;grid filters weren’t reapplied when time selection was changed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you have ever been annoyed by any of this, you’ll be glad to know that
they’re now all fixed!&lt;/p&gt;

&lt;h3 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;This 4th version of PoWA represents a lot of time on development, documentation
improvements and testing.  We’re now quite satisfied with it, but we may have
missed some bugs.  If you’re interested in this project, I hope that you’ll
consider testing the beta, and if needed don’t hesitate &lt;a href=&quot;https://powa.readthedocs.io/en/latest/support.html#support&quot;&gt;to report a
bug&lt;/a&gt;!&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://rjuju.github.io/postgresql/2019/05/17/powa-4-with-remote-mode-beta-is-available.html&quot;&gt;PoWA 4 brings a remote mode, available in beta!&lt;/a&gt; was originally published by Julien Rouhaud at &lt;a href=&quot;https://rjuju.github.io&quot;&gt;rjuju's home&lt;/a&gt; on May 17, 2019.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[New in pg12: Statistics on checkums errors]]></title>
  <link rel="alternate" type="text/html" href="https://rjuju.github.io/postgresql/2019/04/18/new-in-pg12-statistics-checksums-errors.html" />
  <id>https://rjuju.github.io/postgresql/2019/04/18/new-in-pg12-statistics-checksums-errors</id>
  <published>2019-04-18T11:02:26+00:00</published>
  <updated>2019-04-18T11:02:26+00:00</updated>
  <author>
    <name>Julien Rouhaud</name>
    <uri>https://rjuju.github.io</uri>
    
  </author>
  <content type="html">
    &lt;h3 id=&quot;data-checksums&quot;&gt;Data checksums&lt;/h3&gt;

&lt;p&gt;Added in &lt;a href=&quot;https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=96ef3b8ff1c&quot;&gt;PostgreSQL
9.3&lt;/a&gt;,
&lt;a href=&quot;https://www.postgresql.org/docs/current/app-initdb.html#APP-INITDB-DATA-CHECKSUMS&quot;&gt;data
checksums&lt;/a&gt;
can help to detect data corruption happening on the storage side.&lt;/p&gt;

&lt;p&gt;Checksums are only enabled if the instance was setup using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;initdb
--data-checksums&lt;/code&gt; (which isn’t the default behavior), or if activated
afterwards with the new
&lt;a href=&quot;https://www.postgresql.org/docs/devel/app-pgchecksums.html&quot;&gt;pg_checksums&lt;/a&gt;
tool also &lt;a href=&quot;https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=ed308d783790&quot;&gt;added in PostgreSQL
12&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;When enabled, checksums are written each time a block is written to disk, and
verified each time a block is read from disk (or from the operating system
cache).  If the checksum verification fails, an error is reported in the logs.
If the block was read by a backend, the query will obviously fails, but if the
block was read by a
&lt;a href=&quot;https://www.postgresql.org/docs/current/protocol-replication.html#id-1.10.5.9.7.1.8.1.12&quot;&gt;BASE_BACKUP&lt;/a&gt;
operation (such as pg_basebackup), the command will continue its processing .
While data checkums will only catch a subset of possible problems, they still
have some values, especially if you don’t trust your storage reliability.&lt;/p&gt;

&lt;p&gt;Up to PostgreSQL 11, any checksum validation error could only be found by
looking into the logs, which clearly isn’t convenient if you want to monitor
such error.&lt;/p&gt;

&lt;h3 id=&quot;new-counters-available-in-pg_stat_database&quot;&gt;New counters available in pg_stat_database&lt;/h3&gt;

&lt;p&gt;To make checksum errors easier to monitor, and help users to react as soon as
such a problem occurs, PostgreSQL 12 adds new counters in the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pg_stat_database&lt;/code&gt; view:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;commit 6b9e875f7286d8535bff7955e5aa3602e188e436
Author: Magnus Hagander &amp;lt;magnus@hagander.net&amp;gt;
Date:   Sat Mar 9 10:45:17 2019 -0800

Track block level checksum failures in pg_stat_database

This adds a column that counts how many checksum failures have occurred
on files belonging to a specific database. Both checksum failures
during normal backend processing and those created when a base backup
detects a checksum failure are counted.

Author: Magnus Hagander
Reviewed by: Julien Rouhaud
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt; &lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;commit 77bd49adba4711b4497e7e39a5ec3a9812cbd52a
Author: Magnus Hagander &amp;lt;magnus@hagander.net&amp;gt;
Date:   Fri Apr 12 14:04:50 2019 +0200

    Show shared object statistics in pg_stat_database

    This adds a row to the pg_stat_database view with datoid 0 and datname
    NULL for those objects that are not in a database. This was added
    particularly for checksums, but we were already tracking more satistics
    for these objects, just not returning it.

    Also add a checksum_last_failure column that holds the timestamptz of
    the last checksum failure that occurred in a database (or in a
    non-dataabase file), if any.

    Author: Julien Rouhaud &amp;lt;rjuju123@gmail.com&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt; &lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;commit 252b707bc41cc9bf6c55c18d8cb302a6176b7e48
Author: Magnus Hagander &amp;lt;magnus@hagander.net&amp;gt;
Date:   Wed Apr 17 13:51:48 2019 +0200

    Return NULL for checksum failures if checksums are not enabled

    Returning 0 could falsely indicate that there is no problem. NULL
    correctly indicates that there is no information about potential
    problems.

    Also return 0 as numbackends instead of NULL for shared objects (as no
    connection can be made to a shared object only).

    Author: Julien Rouhaud &amp;lt;rjuju123@gmail.com&amp;gt;
    Reviewed-by: Robert Treat &amp;lt;rob@xzilla.net&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Those counters will reflect checksum validation errors for both backend
activity and
&lt;a href=&quot;https://www.postgresql.org/docs/current/protocol-replication.html#id-1.10.5.9.7.1.8.1.12&quot;&gt;BASE_BACKUP&lt;/a&gt;
activity, per database.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;\&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;d&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_stat_database&lt;/span&gt;
                        &lt;span class=&quot;k&quot;&gt;View&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;&quot;pg_catalog.pg_stat_database&quot;&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;Column&lt;/span&gt;         &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;k&quot;&gt;Type&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Collation&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Nullable&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Default&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-----------------------+--------------------------+-----------+----------+---------&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;datid&lt;/span&gt;                 &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;oid&lt;/span&gt;                      &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;          &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;datname&lt;/span&gt;               &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;                     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;          &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
 &lt;span class=&quot;p&quot;&gt;[...]&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;checksum_failures&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;bigint&lt;/span&gt;                   &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;          &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;checksum_last_failure&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;timestamp&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;time&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;zone&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;          &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
 &lt;span class=&quot;p&quot;&gt;[...]&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;stats_reset&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;timestamp&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;time&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;zone&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;          &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;checksum_failures&lt;/code&gt; column will show a cumulated number of errors, and the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;checksum_last_failure&lt;/code&gt; column will show the timestamp of the last checksum
failure on the database (NULL if no error ever happened).&lt;/p&gt;

&lt;p&gt;To avoid any confusion (thanks to Robert Treat for pointing it), those two
columns will always return NULL if data checksums aren’t enabled, so people
won’t mistakenly think that data checksums are always successfully verified.&lt;/p&gt;

&lt;p&gt;As a side effect, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pg_stat_database&lt;/code&gt; will also now show available statistics
for shared objects (such as the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pg_database&lt;/code&gt; table for instance), in a new row
with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;datid&lt;/code&gt; valued to &lt;strong&gt;0&lt;/strong&gt;, and a &lt;strong&gt;NULL&lt;/strong&gt; &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;datname&lt;/code&gt;.  Those were always
accumulated, but weren’t displayed in any system view until now.&lt;/p&gt;

&lt;p&gt;&lt;del&gt;A dedicated check is also &lt;a href=&quot;https://github.com/OPMDG/check_pgactivity/issues/226&quot;&gt;already
planned&lt;/a&gt; in
&lt;a href=&quot;https://opm.readthedocs.io/probes/check_pgactivity.html&quot;&gt;check_pgactivity&lt;/a&gt;!&lt;/del&gt;
A dedicated check is also &lt;a href=&quot;https://github.com/OPMDG/check_pgactivity/commit/0e8b516e95e4364470d4e205aebc9fe68bbcfd23&quot;&gt;already
available&lt;/a&gt;
in &lt;a href=&quot;https://opm.readthedocs.io/probes/check_pgactivity.html&quot;&gt;check_pgactivity&lt;/a&gt;!&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://rjuju.github.io/postgresql/2019/04/18/new-in-pg12-statistics-checksums-errors.html&quot;&gt;New in pg12: Statistics on checkums errors&lt;/a&gt; was originally published by Julien Rouhaud at &lt;a href=&quot;https://rjuju.github.io&quot;&gt;rjuju's home&lt;/a&gt; on April 18, 2019.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[pg_stat_kcache 2.1 is out]]></title>
  <link rel="alternate" type="text/html" href="https://rjuju.github.io/postgresql/2018/07/17/pg_stat_kcache-2-1-is-out.html" />
  <id>https://rjuju.github.io/postgresql/2018/07/17/pg_stat_kcache-2-1-is-out</id>
  <published>2018-07-17T17:34:13+00:00</published>
  <updated>2018-07-17T17:34:13+00:00</updated>
  <author>
    <name>Julien Rouhaud</name>
    <uri>https://rjuju.github.io</uri>
    
  </author>
  <content type="html">
    &lt;p&gt;A new version of &lt;a href=&quot;https://github.com/powa-team/pg_stat_kcache/&quot;&gt;pg_stat_kcache&lt;/a&gt;
is out, with support for Windows and other platforms, and more counters
available.&lt;/p&gt;

&lt;h3 id=&quot;whats-new&quot;&gt;What’s new&lt;/h3&gt;

&lt;p&gt;Version 2.1 of &lt;a href=&quot;https://github.com/powa-team/pg_stat_kcache/&quot;&gt;pg_stat_kcache&lt;/a&gt;
has just been released.&lt;/p&gt;

&lt;p&gt;The two main new features are:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;compatibility with platform without &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;getrusage()&lt;/code&gt; support (such as Windows)&lt;/li&gt;
  &lt;li&gt;more fields of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;getrusage()&lt;/code&gt; are exposed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As I explained in &lt;a href=&quot;/postgresql/2015/03/04/pg_stat_kcache-2-0.html&quot;&gt;a previous article&lt;/a&gt;, this extension is a wrapper on top of
&lt;a href=&quot;http://man7.org/linux/man-pages/man2/getrusage.2.html&quot;&gt;getrusage&lt;/a&gt;, that
accumulates performance counters per normalized query.  It was already giving
some precious informations that allows a DBA to identify CPU-intensive queries,
or compute a real hit-ratio for instance.&lt;/p&gt;

&lt;p&gt;However, it was only available on platforms that have a native version
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;getrusage&lt;/code&gt;, so Windows and some other platforms were not supported.  But
fortunately, PostgreSQL does offer a &lt;a href=&quot;https://github.com/postgres/postgres/blob/master/src/port/getrusage.c&quot;&gt;basic support of
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;getrusage()&lt;/code&gt;&lt;/a&gt;
for those platforms.  This infrastructure has been used in the version 2.1.0 of
pg_stat_kcache, which means that you can now use this extension on Windows
and all the other platforms that wasn’t supported previously.  As this is a
limited support, only the user and system CPU metrics will be available, the
other fields will always be NULL.&lt;/p&gt;

&lt;p&gt;This new version also exposes all the remaining fields of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;getrusage()&lt;/code&gt; that
have a meaning when accumulated per query:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;soft page faults&lt;/li&gt;
  &lt;li&gt;hard page faults&lt;/li&gt;
  &lt;li&gt;swaps&lt;/li&gt;
  &lt;li&gt;IPC messages sent and received&lt;/li&gt;
  &lt;li&gt;signals received&lt;/li&gt;
  &lt;li&gt;voluntary and involuntary context switches&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Another change is to automatically detect the operating system’s clock tick.
Otherwise, very short queries (faster than a clock tick) would be either
detected as not consuming CPU time, or consuming CPU time from earlier short
queries.  For queries faster than 3 clock ticks, where imprecision is high,
pg_stat_kcache will instead use the query duration as CPU user time, and
won’t use anything as CPU system time.&lt;/p&gt;

&lt;h3 id=&quot;small-example&quot;&gt;Small example&lt;/h3&gt;

&lt;p&gt;Depending on your platform, some of those new counters aren’t maintained.  On
GNU/Linux for instance , the swaps, IPC messages and signaled are unfortunately
not maintained, but those which are are still quite interesting.  For instance,
let’s compare the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;context switches&lt;/code&gt; if we run the same number of total
transaction but with either 2 or 80 concurrent connections on a 4-core laptop:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;psql &lt;span class=&quot;nt&quot;&gt;-c&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;SELECT pg_stat_kcache_reset()&quot;&lt;/span&gt;
pgbench &lt;span class=&quot;nt&quot;&gt;-c&lt;/span&gt; 80 &lt;span class=&quot;nt&quot;&gt;-j&lt;/span&gt; 80 &lt;span class=&quot;nt&quot;&gt;-S&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-n&lt;/span&gt; pgbench &lt;span class=&quot;nt&quot;&gt;-t&lt;/span&gt; 100
&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;...]
number of transactions actually processed: 8000/8000
latency average &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; 8.782 ms
tps &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; 9109.846256 &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;including connections establishing&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
tps &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; 9850.666577 &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;excluding connections establishing&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

psql &lt;span class=&quot;nt&quot;&gt;-c&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;SELECT user_time, system_time, minflts, majflts, nvcsws, nivcsws FROM pg_stat_kcache WHERE datname = 'pgbench'&quot;&lt;/span&gt;
     user_time     |    system_time     | minflts | majflts | nvcsws | nivcsws
&lt;span class=&quot;nt&quot;&gt;-------------------&lt;/span&gt;+--------------------+---------+---------+--------+---------
 0.431648000000005 | 0.0638690000000001 |   24353 |       0 |     91 |     282
&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;1 row&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

psql &lt;span class=&quot;nt&quot;&gt;-c&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;SELECT pg_stat_kcache_reset()&quot;&lt;/span&gt;
pgbench &lt;span class=&quot;nt&quot;&gt;-c&lt;/span&gt; 2 &lt;span class=&quot;nt&quot;&gt;-j&lt;/span&gt; 2 &lt;span class=&quot;nt&quot;&gt;-S&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-n&lt;/span&gt; pgbench &lt;span class=&quot;nt&quot;&gt;-t&lt;/span&gt; 8000
&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;...]
number of transactions actually processed: 8000/8000
latency average &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; 0.198 ms
tps &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; 10119.638426 &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;including connections establishing&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
tps &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; 10188.313645 &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;excluding connections establishing&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

psql &lt;span class=&quot;nt&quot;&gt;-c&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;SELECT user_time, system_time, minflts, majflts, nvcsws, nivcsws FROM pg_stat_kcache WHERE datname = 'pgbench'&quot;&lt;/span&gt;
     user_time     | system_time | minflts | majflts | nvcsws | nivcsws 
&lt;span class=&quot;nt&quot;&gt;-------------------&lt;/span&gt;+-------------+---------+---------+--------+---------
 0.224338999999999 |    0.023669 |    5983 |       0 |      0 |       8
&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;1 row&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;As expected, having 80 concurrent connections on a 4-core laptop is not the
most efficient way to process 8000 transactions.  The transactions latency is
&lt;strong&gt;44 times&lt;/strong&gt; slower with 80 connections than with 2 connections.  At the O/S
level, we can see that with only 2 concurrent connections, we had only &lt;strong&gt;8
involuntary context switches&lt;/strong&gt; on all queries on the &lt;strong&gt;pgbench&lt;/strong&gt; database,
while there were &lt;strong&gt;282, so 35 times more&lt;/strong&gt; with 80 concurrent connections.&lt;/p&gt;

&lt;p&gt;Those new metrics give a lot more information of what’s happening at the O/S
level, on a per normalized query granularity, and will ease diagnostic of
performance issues.  Combined with &lt;a href=&quot;https://powa.readthedocs.io/&quot;&gt;PoWA&lt;/a&gt;, you’ll
even be able to identify when any of those metrics have a different behavior!&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://rjuju.github.io/postgresql/2018/07/17/pg_stat_kcache-2-1-is-out.html&quot;&gt;pg_stat_kcache 2.1 is out&lt;/a&gt; was originally published by Julien Rouhaud at &lt;a href=&quot;https://rjuju.github.io&quot;&gt;rjuju's home&lt;/a&gt; on July 17, 2018.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[Wait Events support for PoWA]]></title>
  <link rel="alternate" type="text/html" href="https://rjuju.github.io/postgresql/2018/07/09/wait-events-support-for-powa.html" />
  <id>https://rjuju.github.io/postgresql/2018/07/09/wait-events-support-for-powa</id>
  <published>2018-07-09T10:43:34+00:00</published>
  <updated>2018-07-09T10:43:34+00:00</updated>
  <author>
    <name>Julien Rouhaud</name>
    <uri>https://rjuju.github.io</uri>
    
  </author>
  <content type="html">
    &lt;p&gt;You can now view the &lt;strong&gt;Wait Events&lt;/strong&gt; in &lt;a href=&quot;https://powa.readthedocs.io/&quot;&gt;PoWA&lt;/a&gt;
thanks to the
&lt;a href=&quot;https://github.com/postgrespro/pg_wait_sampling/&quot;&gt;pg_wait_sampling&lt;/a&gt;
extension.&lt;/p&gt;

&lt;h3 id=&quot;wait-events--pg_wait_sampling&quot;&gt;Wait Events &amp;amp; pg_wait_sampling&lt;/h3&gt;

&lt;p&gt;Wait events are a famous and useful feature in a lot of RDBMS.  They have been
added in &lt;a href=&quot;https://github.com/postgres/postgres/commit/53be0b1add7&quot;&gt;PostgreSQL
9.6&lt;/a&gt;, quite a few
versions ago.  Unlike most of others PostgreSQL statistics, those are only an
instant view of what the processes are currently waiting on, and not some
cumulated counters.  You can get those event using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pg_stat_activity&lt;/code&gt; view,
for instance:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;datid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;wait_event_type&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;wait_event&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;query&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_stat_activity&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;datid&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;pid&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;wait_event_type&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;     &lt;span class=&quot;n&quot;&gt;wait_event&lt;/span&gt;      &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;                                  &lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;--------+-------+-----------------+---------------------+-------------------------------------------------------------------------&lt;/span&gt;
 &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;13782&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Activity&lt;/span&gt;        &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;AutoVacuumMain&lt;/span&gt;      &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
  &lt;span class=&quot;mi&quot;&gt;16384&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;16615&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Lock&lt;/span&gt;            &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;relation&lt;/span&gt;            &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;mi&quot;&gt;16384&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;16621&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Client&lt;/span&gt;          &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ClientRead&lt;/span&gt;          &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;LOCK&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
 &lt;span class=&quot;mi&quot;&gt;847842&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;16763&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;LWLock&lt;/span&gt;          &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WALWriteLock&lt;/span&gt;        &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
 &lt;span class=&quot;mi&quot;&gt;847842&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;16764&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Lock&lt;/span&gt;            &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;transactionid&lt;/span&gt;       &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;UPDATE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pgbench_branches&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SET&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bbalance&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bbalance&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1229&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
 &lt;span class=&quot;mi&quot;&gt;847842&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;16766&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;LWLock&lt;/span&gt;          &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WALWriteLock&lt;/span&gt;        &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;END&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
 &lt;span class=&quot;mi&quot;&gt;847842&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;16767&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Lock&lt;/span&gt;            &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;transactionid&lt;/span&gt;       &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;UPDATE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pgbench_tellers&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SET&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tbalance&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tbalance&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3383&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;86&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
 &lt;span class=&quot;mi&quot;&gt;847842&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;16769&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Lock&lt;/span&gt;            &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;transactionid&lt;/span&gt;       &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;UPDATE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pgbench_branches&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SET&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bbalance&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bbalance&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3786&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;[...]&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;In this example, we can see that the wait event for pid 16615 is a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Lock&lt;/code&gt; on
a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Relation&lt;/code&gt;.  In other words, the query is blocked waiting for a heavyweight
lock, while the pid 16621, which obviously holds the lock, is idle waiting for
client commands.  This is something we could already see in previous version,
though in a different manner.  But more interesting, we can also see that the
wait event for pid 16766 is a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LWLock&lt;/code&gt;, or a &lt;strong&gt;Lightweight Lock&lt;/strong&gt;.  Those are
internal transient locks that you previsouly couldn’t see at the SQL level.  In
this example, the query is waiting for the &lt;strong&gt;WALWriteLock&lt;/strong&gt;, a lightweight lock
mainly used to control the write to WAL buffers.  A complete list of the
available wait events is &lt;a href=&quot;https://www.postgresql.org/docs/current/static/monitoring-stats.html#WAIT-EVENT-TABLE&quot;&gt;available on the official
documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Those information were lacking and are helpful to diagnose bottlenecks.
However, only having an instant view of those wait events is certainly not
enough to have a good idea of what’s happening on a server.  Since most of the
wait events are by nature transient events, what you need is to sample them at
some high frequency.  Trying to sample them with some external tool, even at a
second interval, is usually not enough.  That’s where the &lt;a href=&quot;https://github.com/postgrespro/pg_wait_sampling/&quot;&gt;pg_wait_sampling
extension&lt;/a&gt; bring a really
cool solution.  It’s an extension written by &lt;a href=&quot;http://akorotkov.github.io/&quot;&gt;Alexander
Korotkov&lt;/a&gt; and Ildus Kurbangaliev.  Once activated
(it requires to be setup in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;shared_preload_libraries&lt;/code&gt;, so a PostgreSQL restart
is necessary), it’ll sample the wait events in shared memory every &lt;strong&gt;10 ms&lt;/strong&gt;
(by default), and also aggregate the counters per wait_event_type,
wait_event, pid and queryid (if &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pg_stat_statements&lt;/code&gt; is also activated).  For
more details on how to configure and use it, you can refer to &lt;a href=&quot;https://github.com/postgrespro/pg_wait_sampling/blob/master/README.md&quot;&gt;the extension’s
README&lt;/a&gt;.
Since the work is done in memory as a C extension, it’s very efficient.  It’s
also implemented with very few locking, so its overhead should be almost
negligible.  I did some benchmarking on my laptop (I unfortunately don’t have
better machine to test on) with a read-only
&lt;a href=&quot;https://www.postgresql.org/docs/current/static/pgbench.html&quot;&gt;pgbench&lt;/a&gt; where
all the data fit in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;shared_memory&lt;/code&gt;, with both 8 and 90 clients, to try to get
a maximal overhead.  The average of 3 runs was around 1%, while the
fluctuations between runs was around 0.8%.&lt;/p&gt;

&lt;h3 id=&quot;and-powa&quot;&gt;And PoWA?&lt;/h3&gt;

&lt;p&gt;So, thanks to this extension, we now have a cumulated and extremely precise
view of the wait events.  That’s quite nice, but as the other cumulated
statistics available in PostgreSQL, you need to sample the counters regularly
if you want to be able to know what happened at a given time in the past, as
stated in the README:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;[…]
Waits profile.  It’s implemented as in-memory hash table where count
of samples are accumulated per each process and each wait event
(and each query with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pg_stat_statements&lt;/code&gt;).  This hash
table can be reset by user request.  Assuming there is a client who
periodically dumps profile and resets it, user can have statistics of
intensivity of wait events among time.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s exactly the aim of &lt;a href=&quot;http://powa.readthedocs.io/&quot;&gt;PoWA&lt;/a&gt;: save the
statistic counters in an efficient way and display them on a GUI.&lt;/p&gt;

&lt;p&gt;PoWA 3.2 will automatically detect if the
&lt;a href=&quot;https://github.com/postgrespro/pg_wait_sampling/&quot;&gt;pg_wait_sampling&lt;/a&gt;
extension is already present or if you install it subsequently and will start
to snapshot its data, giving a really precise view of the wait events on your
databases over time!&lt;/p&gt;

&lt;p&gt;The data is gathered in &lt;a href=&quot;/postgresql/2016/09/16/minimizing-tuple-overhead.html&quot;&gt;standard PoWA tables&lt;/a&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;powa_wait_sampling_history_current&lt;/code&gt;
for the last 100 (default &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;powa.coalesce&lt;/code&gt;) snapshots, and the older values are
aggregated in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;powa_wait_sampling_history&lt;/code&gt;, with up to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;powa.retention&lt;/code&gt;
history.  For instance, here’s a simple query displaying the first 20 changes
that occured in the last 100 snapshots:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;record&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;event_type&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;event&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;record&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;count&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lag&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;record&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;OVER&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;PARTITION&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;event_type&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;event&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;record&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;events&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;powa_wait_sampling_history_current&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;w&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_database&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;oid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;w&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dbid&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;datname&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'bench'&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;events&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ASC&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;event&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DESC&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;LIMIT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;20&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
              &lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt;               &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;       &lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt;        &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;event_type&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;     &lt;span class=&quot;n&quot;&gt;event&lt;/span&gt;      &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;events&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-------------------------------+----------------------+------------+----------------+--------&lt;/span&gt;
 &lt;span class=&quot;mi&quot;&gt;2018&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;07&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;09&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;44&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;08&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;037191&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;02&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6531859117817823569&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;LWLock&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_qualstats&lt;/span&gt;   &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;   &lt;span class=&quot;mi&quot;&gt;1233&lt;/span&gt;
 &lt;span class=&quot;mi&quot;&gt;2018&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;07&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;09&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;44&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;28&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;035212&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;02&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;mi&quot;&gt;8851222058009799098&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Lock&lt;/span&gt;       &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tuple&lt;/span&gt;          &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;      &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;
 &lt;span class=&quot;mi&quot;&gt;2018&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;07&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;09&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;44&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;28&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;035212&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;02&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6860707137622661878&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Lock&lt;/span&gt;       &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tuple&lt;/span&gt;          &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;    &lt;span class=&quot;mi&quot;&gt;149&lt;/span&gt;
 &lt;span class=&quot;mi&quot;&gt;2018&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;07&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;09&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;44&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;28&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;035212&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;02&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;mi&quot;&gt;8851222058009799098&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Lock&lt;/span&gt;       &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;transactionid&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;    &lt;span class=&quot;mi&quot;&gt;193&lt;/span&gt;
 &lt;span class=&quot;mi&quot;&gt;2018&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;07&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;09&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;44&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;28&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;035212&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;02&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6860707137622661878&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Lock&lt;/span&gt;       &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;transactionid&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;   &lt;span class=&quot;mi&quot;&gt;1143&lt;/span&gt;
 &lt;span class=&quot;mi&quot;&gt;2018&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;07&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;09&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;44&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;28&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;035212&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;02&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6531859117817823569&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;LWLock&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_qualstats&lt;/span&gt;   &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;      &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
 &lt;span class=&quot;mi&quot;&gt;2018&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;07&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;09&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;44&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;28&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;035212&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;02&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;mi&quot;&gt;8851222058009799098&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;LWLock&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lock_manager&lt;/span&gt;   &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;      &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;
 &lt;span class=&quot;mi&quot;&gt;2018&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;07&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;09&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;44&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;28&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;035212&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;02&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6860707137622661878&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;LWLock&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lock_manager&lt;/span&gt;   &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;      &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;
 &lt;span class=&quot;mi&quot;&gt;2018&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;07&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;09&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;44&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;28&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;035212&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;02&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6860707137622661878&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;LWLock&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;buffer_content&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;      &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;
 &lt;span class=&quot;mi&quot;&gt;2018&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;07&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;09&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;44&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;48&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;037205&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;02&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;mi&quot;&gt;8851222058009799098&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Lock&lt;/span&gt;       &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tuple&lt;/span&gt;          &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;     &lt;span class=&quot;mi&quot;&gt;14&lt;/span&gt;
 &lt;span class=&quot;mi&quot;&gt;2018&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;07&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;09&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;44&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;48&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;037205&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;02&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6860707137622661878&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Lock&lt;/span&gt;       &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tuple&lt;/span&gt;          &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;    &lt;span class=&quot;mi&quot;&gt;335&lt;/span&gt;
 &lt;span class=&quot;mi&quot;&gt;2018&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;07&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;09&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;44&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;48&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;037205&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;02&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6860707137622661878&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Lock&lt;/span&gt;       &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;transactionid&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;   &lt;span class=&quot;mi&quot;&gt;2604&lt;/span&gt;
 &lt;span class=&quot;mi&quot;&gt;2018&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;07&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;09&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;44&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;48&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;037205&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;02&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;mi&quot;&gt;8851222058009799098&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Lock&lt;/span&gt;       &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;transactionid&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;    &lt;span class=&quot;mi&quot;&gt;384&lt;/span&gt;
 &lt;span class=&quot;mi&quot;&gt;2018&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;07&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;09&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;44&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;48&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;037205&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;02&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6860707137622661878&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;LWLock&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lock_manager&lt;/span&gt;   &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;     &lt;span class=&quot;mi&quot;&gt;13&lt;/span&gt;
 &lt;span class=&quot;mi&quot;&gt;2018&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;07&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;09&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;44&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;48&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;037205&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;02&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;mi&quot;&gt;8851222058009799098&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;LWLock&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lock_manager&lt;/span&gt;   &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;      &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;
 &lt;span class=&quot;mi&quot;&gt;2018&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;07&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;09&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;44&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;48&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;037205&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;02&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;mi&quot;&gt;8221555873158496753&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IO&lt;/span&gt;         &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataFileExtend&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;      &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
 &lt;span class=&quot;mi&quot;&gt;2018&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;07&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;09&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;44&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;48&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;037205&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;02&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6860707137622661878&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;LWLock&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;buffer_content&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;      &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;
 &lt;span class=&quot;mi&quot;&gt;2018&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;07&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;09&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;45&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;08&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;032938&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;02&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;mi&quot;&gt;8851222058009799098&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Lock&lt;/span&gt;       &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tuple&lt;/span&gt;          &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;      &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;
 &lt;span class=&quot;mi&quot;&gt;2018&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;07&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;09&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;45&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;08&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;032938&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;02&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6860707137622661878&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Lock&lt;/span&gt;       &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tuple&lt;/span&gt;          &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;    &lt;span class=&quot;mi&quot;&gt;312&lt;/span&gt;
 &lt;span class=&quot;mi&quot;&gt;2018&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;07&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;09&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;45&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;08&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;032938&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;02&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6860707137622661878&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Lock&lt;/span&gt;       &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;transactionid&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;   &lt;span class=&quot;mi&quot;&gt;2586&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;20&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p class=&quot;notice&quot;&gt;&lt;strong&gt;NOTE:&lt;/strong&gt; There’s also a per-database version of those metrics for easier
computation at the database level in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;powa_wait_sampling_history_current_db&lt;/code&gt;
and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;powa_wait_sampling_history_db&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;And they are visible in the &lt;a href=&quot;https://pypi.org/project/powa-web/&quot;&gt;powa-web&lt;/a&gt;
interface.  Here are some examples of the wait event display with a simple
pgbench run:&lt;/p&gt;

&lt;h5 id=&quot;wait-events-for-the-whole-cluster&quot;&gt;Wait events for the whole cluster&lt;/h5&gt;

&lt;p&gt;&lt;a href=&quot;/images/powa_waits_overview.png&quot;&gt;&lt;img src=&quot;/images/powa_waits_overview.png&quot; alt=&quot;Wait events for the whole cluster&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h5 id=&quot;wait-events-for-a-database&quot;&gt;Wait events for a database&lt;/h5&gt;

&lt;p&gt;&lt;a href=&quot;/images/powa_waits_db.png&quot;&gt;&lt;img src=&quot;/images/powa_waits_db.png&quot; alt=&quot;Wait events for a database&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h5 id=&quot;wait-events-for-a-single-query&quot;&gt;Wait events for a single query&lt;/h5&gt;

&lt;p&gt;&lt;a href=&quot;/images/powa_waits_query.png&quot;&gt;&lt;img src=&quot;/images/powa_waits_query.png&quot; alt=&quot;Wait events for a single query&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class=&quot;gallery&quot;&gt;
&lt;/div&gt;

&lt;p&gt;This feature is still under development, but you can already test it using the
latest git commits.  I hope to add more views of those data in the near future,
including some other graphs, since all the data are available.  And also, if
you’re a python / javascript developer, &lt;a href=&quot;https://github.com/powa-team/powa-web&quot;&gt;contributions are always
welcome&lt;/a&gt;!&lt;/p&gt;


    &lt;p&gt;&lt;a href=&quot;https://rjuju.github.io/postgresql/2018/07/09/wait-events-support-for-powa.html&quot;&gt;Wait Events support for PoWA&lt;/a&gt; was originally published by Julien Rouhaud at &lt;a href=&quot;https://rjuju.github.io&quot;&gt;rjuju's home&lt;/a&gt; on July 09, 2018.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[Diagnostic of an unexpected slowdown]]></title>
  <link rel="alternate" type="text/html" href="https://rjuju.github.io/postgresql/2018/07/03/diagnostic-of-unexpected-slowdown.html" />
  <id>https://rjuju.github.io/postgresql/2018/07/03/diagnostic-of-unexpected-slowdown</id>
  <published>2018-07-03T17:56:34+00:00</published>
  <updated>2018-07-03T17:56:34+00:00</updated>
  <author>
    <name>Julien Rouhaud</name>
    <uri>https://rjuju.github.io</uri>
    
  </author>
  <content type="html">
    &lt;p&gt;This blog post is a summary of a production issue I had to investigate some
time ago with people from &lt;a href=&quot;https://oslandia.com/en/home-en/&quot;&gt;Oslandia&lt;/a&gt;, and
since it’s quite unusual I wanted to share it with some methodology I used, if
it can help anyone running into the same kind of problem.  It’s also a good
opportunity to say that upgrading to a newer PostgreSQL version is almost
always a good idea.&lt;/p&gt;

&lt;h3 id=&quot;the-problem&quot;&gt;The problem&lt;/h3&gt;

&lt;p&gt;The initial performance issue reported enough information to know something
strange was happening.&lt;/p&gt;

&lt;p&gt;The database is a PostgreSQL 9.3.5 (yes, missing some minor version updates),
and obviously a lot of major versions late.  The configuration also had quite
unusual settings.  The most relevant hardware and settings are:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Server
    CPU: 40 core, 80 with hyperthreading enabled
    RAM: 128 GB
PostgreSQL:
    shared_buffers: 16 GB
    max_connections: 1500
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The high &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;shared_buffers&lt;/code&gt;, especially given the quite old PostgreSQL version,
is a good candidate for more investigation.  The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;max_connections&lt;/code&gt; is also
quite high, but unfortunately the software vendor claims that using a
connection pooler isn’t supported.  Therefore most of the connections are idle.
This isn’t great because it implies quite some overhead to acquire a snapshot,
but there are enough CPU to handle quite a lot of connections.&lt;/p&gt;

&lt;p&gt;The main problem was that sometimes, the same queries could be extremely
slower.  The following trivial example was provided:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;EXPLAIN&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ANALYZE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_stat_activity&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;-- When the issue happens&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;Aggregate  (actual time=670.719..670.720 rows=1 loops=1)&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;  -&amp;gt;  Nested Loop  (actual time=663.739..670.392 rows=1088 loops=1)&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;        -&amp;gt;  Hash Join  (actual time=2.987..4.278 rows=1088 loops=1)&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;              Hash Cond: (s.usesysid = u.oid)&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;              -&amp;gt;  Function Scan on pg_stat_get_activity s  (actual time=2.941..3.302 rows=1088 loops=1)&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;              -&amp;gt;  Hash  (actual time=0.022..0.022 rows=12 loops=1)&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;                    Buckets: 1024  Batches: 1  Memory Usage: 1kB&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;                    -&amp;gt;  Seq Scan on pg_authid u  (actual time=0.008..0.013 rows=12 loops=1)&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;        -&amp;gt;  Index Only Scan using pg_database_oid_index on pg_database d  (actual time=0.610..0.611 rows=1 loops=1088)&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;              Index Cond: (oid = s.datid)&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;              Heap Fetches: 0&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;Total runtime: 670.880 ms&quot;&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;-- Normal timing&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;Aggregate  (actual time=6.370..6.370 rows=1 loops=1)&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;  -&amp;gt;  Nested Loop  (actual time=3.581..6.159 rows=1088 loops=1)&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;        -&amp;gt;  Hash Join  (actual time=3.560..4.310 rows=1088 loops=1)&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;              Hash Cond: (s.usesysid = u.oid)&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;              -&amp;gt;  Function Scan on pg_stat_get_activity s  (actual time=3.507..3.694 rows=1088 loops=1)&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;              -&amp;gt;  Hash  (actual time=0.023..0.023 rows=12 loops=1)&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;                    Buckets: 1024  Batches: 1  Memory Usage: 1kB&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;                    -&amp;gt;  Seq Scan on pg_authid u  (actual time=0.009..0.014 rows=12 loops=1)&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;        -&amp;gt;  Index Only Scan using pg_database_oid_index on pg_database d  (actual time=0.001..0.001 rows=1 loops=1088)&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;              Index Cond: (oid = s.datid)&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;              Heap Fetches: 0&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;Total runtime: 6.503 ms&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;So while the “good” timing is a little but slow (though there are 1500
connections), the “bad” timing is more than &lt;strong&gt;100x slower&lt;/strong&gt;, for a very simple
query.&lt;/p&gt;

&lt;p&gt;Another example of a trivial query on production data was provided, but with
some more informations.  Here’s an anonymized version:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;EXPLAIN&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;ANALYZE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BUFFERS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;some_col&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;some_table&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;some_indexed_col&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'value'&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;uppser&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;other_col&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'other_value'&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;LIMIT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;nv&quot;&gt;&quot;Limit  (actual time=7620.756..7620.756 rows=0 loops=1)&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;  Buffers: shared hit=43554&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;  -&amp;gt;  Index Scan using idx_some_table_some_col on some_table  (actual time=7620.754..7620.754 rows=0 loops=1)&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;        Index Cond: ((some_indexed_cold)::text = 'value'::text)&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;        Filter: (upper((other_col)::text) = 'other_value'::text)&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;        Rows Removed by Filter: 17534&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;        Buffers: shared hit=43554&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;Total runtime: 7620.829 ms&quot;&lt;/span&gt;

&lt;span class=&quot;nv&quot;&gt;&quot;Limit  (actual time=899.607..899.607 rows=0 loops=1)&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;  Buffers: shared hit=43555&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;  -&amp;gt;  Index Scan using idx_some_table_some_col on some_table  (actual time=899.605..899.605 rows=0 loops=1)&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;        Index Cond: ((some_indexed_cold)::text = 'value'::text)&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;        Filter: (upper((other_col)::text) = 'other_value'::text)&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;        Rows Removed by Filter: 17534&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;        Buffers: shared hit=43555&quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;&quot;Total runtime: 899.652 ms&quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;There was also quite some instrumentation data on O/S side, showing that
neither the disk, CPU or RAM where exhausted, and no interesting message in
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;dmesg&lt;/code&gt; or any system log.&lt;/p&gt;

&lt;h3 id=&quot;what-do-we-know&quot;&gt;What do we know?&lt;/h3&gt;

&lt;p&gt;For the first query, we see that the inner index scan average time raises from
&lt;strong&gt;0.001ms&lt;/strong&gt; to &lt;strong&gt;0.6ms&lt;/strong&gt;:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-none&quot; data-lang=&quot;none&quot;&gt;-&amp;gt;  Index Only Scan using idx on pg_database (actual time=0.001..0.001 rows=1 loops=1088)

-&amp;gt;  Index Only Scan using idx on pg_database (actual time=0.610..0.611 rows=1 loops=1088)&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;With a high &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;shared_buffers&lt;/code&gt; setting and an old PostgreSQL version, a common
issue is a slowdown when the dataset is larger that the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;shared_buffers&lt;/code&gt;, due
to the &lt;strong&gt;clocksweep&lt;/strong&gt; algorithm used to evict buffers.&lt;/p&gt;

&lt;p&gt;However, the second query shows that the same thing is happening while all the
blocks are in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;shared_buffers&lt;/code&gt;.  This cannot be a buffer eviction problem due
to too high &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;shared_buffers&lt;/code&gt; setting, or any disk latency issue.&lt;/p&gt;

&lt;p&gt;While some PostgreSQL configuration settings could be improved, none of them
seems to explain this exact behavior.  It’d be likely that modifying them will
fix the situation, but we need more information to undetstand what’s happening
here exactly and avoid any further performance issue.&lt;/p&gt;

&lt;h3 id=&quot;any-wild-guess&quot;&gt;Any wild guess?&lt;/h3&gt;

&lt;p&gt;Since the simple explanations have been discarded, it’s necessary to think
about lower level explanations.&lt;/p&gt;

&lt;p&gt;If you followed the latest PostgreSQL versions enhancements, you should have
noticed quite a few optimizations on scalability and locking.  If you want more
information, there are plenty of blog entries about these, for instance &lt;a href=&quot;http://amitkapila16.blogspot.tw/2015/01/read-scalability-in-postgresql-95.html&quot;&gt;this
great
article&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;On the kernel side and given the high number of connections, it also can be,
and it’s probably the most likely explanation, a
&lt;a href=&quot;https://en.wikipedia.org/wiki/Translation_lookaside_buffer&quot;&gt;TLB&lt;/a&gt; exhaustion.&lt;/p&gt;

&lt;p&gt;In any case, in order to confirm any theory we need to use very specific tools.&lt;/p&gt;

&lt;h3 id=&quot;deeper-analysis-tlb-exhaustion&quot;&gt;Deeper analysis: TLB exhaustion&lt;/h3&gt;

&lt;p&gt;Without going into too much detail, you need to know that each processus has an
area of kernel memory used to store the &lt;a href=&quot;https://en.wikipedia.org/wiki/Page_table#PTE&quot;&gt;page tables
entries&lt;/a&gt;, called the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PTE&lt;/code&gt;, which
are mapping between the virtual addresses that the process are using and the
real physical address in RAM.  This area is usually not big, because usually
a process doesn’t access to gigabytes of data in RAM.  But since PostgreSQL is
relying on multiple processes accessing a big chunk of shared memory, each
process will have an entry for each 4kB (the default page size) address of the
shared buffers it has accessed.  So you can end up with quite a lot of memory
used for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PTE&lt;/code&gt;, and even have overall mappings that address way more than
the total physical memory available on the server.&lt;/p&gt;

&lt;p&gt;You can know the size of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PTE&lt;/code&gt; at the O/S level looking for the &lt;strong&gt;VmPTE&lt;/strong&gt;
entry in the processus status.  You can also check the &lt;strong&gt;RssShmem&lt;/strong&gt; entry to
know how many shared memory pages is mapped.  For instance:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;egrep &lt;span class=&quot;s2&quot;&gt;&quot;(VmPTE|RssShmem)&quot;&lt;/span&gt; /proc/&lt;span class=&quot;k&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;PID&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;}&lt;/span&gt;/status
RssShmem:	     340 kB
VmPTE:	     140 kB&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;This process didn’t access lots of buffers, so the PTE is small.  If we try
with a process which has accessed all the buffers of a 8 GB shared_buffers:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;egrep &lt;span class=&quot;s2&quot;&gt;&quot;(VmPTE|RssShmem)&quot;&lt;/span&gt; /proc/&lt;span class=&quot;k&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;PID&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;}&lt;/span&gt;/status
RssShmem:	 8561116 kB
VmPTE:	   16880 kB&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;It’s &lt;strong&gt;16 MB&lt;/strong&gt; used for the PTE!  Multiplying that with the number of
connections, and you end up with gigabytes of memory used for the PTE.
Obviously, this wont’ fit in the TLB.  As a consequence, the processes will
have a lot of TLB miss every time they need to access a page in memory,
drastically increasing the latency.&lt;/p&gt;

&lt;p&gt;On the system that had performance issue, with &lt;strong&gt;16 GB&lt;/strong&gt; of shared buffers and
&lt;strong&gt;1500&lt;/strong&gt; long lived connections, the total memory size of the combined PTE was
around &lt;strong&gt;45 GB&lt;/strong&gt;!  A rough approximation can be done with this small script:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span class=&quot;k&quot;&gt;for &lt;/span&gt;p &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;si&quot;&gt;$(&lt;/span&gt;pgrep postgres&lt;span class=&quot;si&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;do &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;grep&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;VmPTE:&quot;&lt;/span&gt; /proc/&lt;span class=&quot;nv&quot;&gt;$p&lt;/span&gt;/status&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;done&lt;/span&gt; | &lt;span class=&quot;nb&quot;&gt;awk&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'{pte += $2} END {print pte / 1024 / 1024}'&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p class=&quot;notice&quot;&gt;&lt;strong&gt;NOTE:&lt;/strong&gt; This will compute the memory used for the PTE of all postgres
processes.  If you have multiple clusters on the same machine and you want to
have per cluster information, you need to adapt this command to only match the
processes whose ppid are you cluster’s postmaster pid.&lt;/p&gt;

&lt;p&gt;This is evidently the culprit here.  Just to be sure, let’s look at what &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;perf&lt;/code&gt;
show when the performance slowdown occurs, and when it doesn’t.&lt;/p&gt;

&lt;p&gt;Here are the top consuming functions (consuming more than 2% of CPU time)
reported by perf when everything is fine:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-none&quot; data-lang=&quot;none&quot;&gt;# Children      Self  Command          Symbol
# ........  ........  ...............  ..................
     4.26%     4.10%  init             [k] intel_idle
     4.22%     2.22%  postgres         [.] SearchCatCache&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Nothing really interesting, the system was really not saturated.  Now when
the problem occurs:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-none&quot; data-lang=&quot;none&quot;&gt;# Children      Self  Command          Symbol
# ........  ........  ...............  ....................
     8.96%     8.64%  postgres         [.] s_lock
     4.50%     4.44%  cat              [k] smaps_pte_entry
     2.51%     2.51%  init             [k] poll_idle
     2.34%     2.28%  postgres         [k] compaction_alloc
     2.03%     2.03%  postgres         [k] _spin_lock&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;We can see &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;s_lock&lt;/code&gt;, the postgres function that wait on a
&lt;a href=&quot;https://en.wikipedia.org/wiki/Spinlock&quot;&gt;spinlock&lt;/a&gt;, consumes almost 9% of the
CPU time.  But this is PostgreSQL 9.3, and lightweight locks (transient
internal locks) were still implented using spin lock (&lt;a href=&quot;https://github.com/postgres/postgres/commit/ab5194e6f617a9a9e7&quot;&gt;they now use atomic
operation&lt;/a&gt;).
If we look a little bit more about &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;s_lock&lt;/code&gt; calls:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-none&quot; data-lang=&quot;none&quot;&gt;     8.96%     8.64%  postgres         [.] s_lock
                   |
                   ---s_lock
                      |
                      |--83.49%-- LWLockAcquire
[...]
                      |--15.59%-- LWLockRelease
[...]
                      |--0.69%-- 0x6382ee
                      |          0x6399ac
                      |          ReadBufferExtended
[...]&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;99% of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;s_lock&lt;/code&gt; calls are indeed due to lightweight locks.  This indicates a
general slowdown and high contentions.  But this is just a consequence of the
real problem, the next top consumer function.&lt;/p&gt;

&lt;p&gt;With almost 5% of the CPU time &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;smaps_pte_entry&lt;/code&gt;, a
kernel function doing the translation for a single entry, shows the problem.
This function is supposed to be extremely fast, and shouldn’t even appear in a
perf record!  It means that very often, when a process wants to access to page
in RAM, it has to wait to get the real address.  But waiting for an address
translation means a lot of &lt;a href=&quot;https://en.wikipedia.org/wiki/Pipeline_stall&quot;&gt;pipeline
stalls&lt;/a&gt;.  Processors have longer
and longer pipeline, and those stalls totally ruin the benefits of this kind of
architecture.  As a result, a good proportion of CPU time is simply wasted
waiting for addresses. This certainly explain the extreme slowdown, and the
lack of high lever counters able to explain such slowdowns.&lt;/p&gt;

&lt;h3 id=&quot;the-solution&quot;&gt;The solution&lt;/h3&gt;

&lt;p&gt;Multiple solutions are possible to solve this problem.&lt;/p&gt;

&lt;p&gt;The usual answer is to &lt;a href=&quot;https://www.postgresql.org/docs/current/static/kernel-resources.html#LINUX-HUGE-PAGES&quot;&gt;ask PostgreSQL to allocate the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;shared_buffers&lt;/code&gt; in huge
pages&lt;/a&gt;.
Indeed, with 2MB pages instead of 4kB, the memory needed for PTE will
automatically drop 512 times.  This would be an easy and huge win.
Unfortunately, this is only possible since version 9.4, but upgrading wasn’t
even an option, since the software vendor claimed that only the 9.3 version is
supported.&lt;/p&gt;

&lt;p&gt;Another way to reduce the PTE size is to reduce the number of connections,
which is quite high here, and would also probably improve performance.
Unfortunately again, the vendor claimed that connection poolers aren’t
supported, and the customer needed that many connections.&lt;/p&gt;

&lt;p&gt;So the only remaining solution was therefore to reduce the shared_buffers.
After some tries, the higher value that could be used to avoid the extreme
slowdown was &lt;strong&gt;4 GB&lt;/strong&gt;.  Fortunately, PostgreSQL was able to have quite good
performance with this size of dedicated cache.&lt;/p&gt;

&lt;p&gt;If software vendors read this post, please understand that if people ask for
newer PostgreSQL version compatibility, or pooler compatibility, they have very
good reasons for that.  There are usually very few behavior changes with newer
versions, and they’re all documented!&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://rjuju.github.io/postgresql/2018/07/03/diagnostic-of-unexpected-slowdown.html&quot;&gt;Diagnostic of an unexpected slowdown&lt;/a&gt; was originally published by Julien Rouhaud at &lt;a href=&quot;https://rjuju.github.io&quot;&gt;rjuju's home&lt;/a&gt; on July 03, 2018.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[Minimizing tuple overhead]]></title>
  <link rel="alternate" type="text/html" href="https://rjuju.github.io/postgresql/2016/09/16/minimizing-tuple-overhead.html" />
  <id>https://rjuju.github.io/postgresql/2016/09/16/minimizing-tuple-overhead</id>
  <published>2016-09-16T12:03:34+00:00</published>
  <updated>2016-09-16T12:03:34+00:00</updated>
  <author>
    <name>Julien Rouhaud</name>
    <uri>https://rjuju.github.io</uri>
    
  </author>
  <content type="html">
    &lt;p&gt;I hear quite often people being disappointed on how much space PostgreSQL is
wasting for each row it stores.  I’ll try to show here some tricks to minimize
this effect, to allow more efficient storage.&lt;/p&gt;

&lt;h3 id=&quot;what-overhead&quot;&gt;What overhead?&lt;/h3&gt;

&lt;p&gt;If you don’t have tables with more than few hundred of million of rows, it’s
likely that you didn’t have an issue with this.&lt;/p&gt;

&lt;p&gt;For each row stored, postgres will store aditionnal data for its own need.
This is
&lt;a href=&quot;https://www.postgresql.org/docs/current/static/storage-page-layout.html#HEAPTUPLEHEADERDATA-TABLE&quot;&gt;documented here&lt;/a&gt;.
The documentation says:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Field&lt;/th&gt;
      &lt;th&gt;Type&lt;/th&gt;
      &lt;th&gt;Length&lt;/th&gt;
      &lt;th&gt;Description&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;t_xmin&lt;/td&gt;
      &lt;td&gt;TransactionId&lt;/td&gt;
      &lt;td&gt;4 bytes&lt;/td&gt;
      &lt;td&gt;insert XID stamp&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;t_xmax&lt;/td&gt;
      &lt;td&gt;TransactionId&lt;/td&gt;
      &lt;td&gt;4 bytes&lt;/td&gt;
      &lt;td&gt;delete XID stamp&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;t_cid&lt;/td&gt;
      &lt;td&gt;CommandId&lt;/td&gt;
      &lt;td&gt;4 bytes&lt;/td&gt;
      &lt;td&gt;insert and/or delete CID stamp (overlays with t_xvac)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;t_xvac&lt;/td&gt;
      &lt;td&gt;TransactionId&lt;/td&gt;
      &lt;td&gt;4 bytes&lt;/td&gt;
      &lt;td&gt;XID for VACUUM operation moving a row version&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;t_ctid&lt;/td&gt;
      &lt;td&gt;ItemPointerData&lt;/td&gt;
      &lt;td&gt;6 bytes&lt;/td&gt;
      &lt;td&gt;current TID of this or newer row version&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;t_infomask2&lt;/td&gt;
      &lt;td&gt;uint16&lt;/td&gt;
      &lt;td&gt;2 bytes&lt;/td&gt;
      &lt;td&gt;number of attributes, plus various flag bits&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;t_infomask&lt;/td&gt;
      &lt;td&gt;uint16&lt;/td&gt;
      &lt;td&gt;2 bytes&lt;/td&gt;
      &lt;td&gt;various flag bits&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;t_hoff&lt;/td&gt;
      &lt;td&gt;uint8&lt;/td&gt;
      &lt;td&gt;1 byte&lt;/td&gt;
      &lt;td&gt;offset to user data&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Which is &lt;strong&gt;23 bytes&lt;/strong&gt; on most architectures (you have either &lt;strong&gt;t_cid&lt;/strong&gt; or
&lt;strong&gt;t_xvac&lt;/strong&gt;).&lt;/p&gt;

&lt;p&gt;You can see part of these fields in hidden column present on any table by
adding them in the SELECT part of a query, or look for negative attribute
number in &lt;strong&gt;pg_attribute&lt;/strong&gt; catalog:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;o&quot;&gt;#&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;\&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;d&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;test&lt;/span&gt;
     &lt;span class=&quot;k&quot;&gt;Table&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;&quot;public.test&quot;&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;Column&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;k&quot;&gt;Type&lt;/span&gt;   &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Modifiers&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;--------+---------+-----------&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;

&lt;span class=&quot;o&quot;&gt;#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;xmin&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;xmax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;test&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;LIMIT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;xmin&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;xmax&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;------+------+----&lt;/span&gt;
 &lt;span class=&quot;mi&quot;&gt;1361&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;    &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;

&lt;span class=&quot;o&quot;&gt;#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;attname&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;attnum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;atttypid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;regtype&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;attlen&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_class&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_attribute&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;attrelid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;oid&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;relname&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'test'&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;attnum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;attname&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;attnum&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;atttypid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;attlen&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;----------+--------+----------+--------&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;tableoid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;oid&lt;/span&gt;      &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;      &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;cmax&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cid&lt;/span&gt;      &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;      &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;xmax&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;xid&lt;/span&gt;      &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;      &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;cmin&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cid&lt;/span&gt;      &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;      &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;xmin&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;xid&lt;/span&gt;      &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;      &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;ctid&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tid&lt;/span&gt;      &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;      &lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;       &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;      &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;      &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;If you compare to the previous table, you can see than not all of these columns
are not stored on disk.  Obviously PostgreSQL doesn’t store the table’s oid in
each row.  It’s added after, while constructing a tuple.&lt;/p&gt;

&lt;p&gt;If you want more technical details, you should read take a look at
&lt;a href=&quot;http://doxygen.postgresql.org/htup__details_8h.html&quot;&gt;htup_detail.c&lt;/a&gt;, starting
with
&lt;a href=&quot;http://doxygen.postgresql.org/structHeapTupleHeaderData.html&quot;&gt;TupleHeaderData struct&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;how-costly-is-it&quot;&gt;How costly is it?&lt;/h3&gt;

&lt;p&gt;As the overhead is fixed, it’ll become more and more neglictable as the row
size grows.  If you only store a single int column (&lt;strong&gt;4 bytes&lt;/strong&gt;), each row will
need:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c&quot; data-lang=&quot;c&quot;&gt;&lt;span class=&quot;mi&quot;&gt;23&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;B&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;B&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;27&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;B&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;So, it’s &lt;strong&gt;85% overhead&lt;/strong&gt;, pretty horrible.&lt;/p&gt;

&lt;p&gt;On the other hand, if you store 5 integer, 3 bigint and 2 text columns (let’s
say ~80B average), you’ll have:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-c&quot; data-lang=&quot;c&quot;&gt;&lt;span class=&quot;mi&quot;&gt;23&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;B&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;B&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;B&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;80&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;B&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;227&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;B&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;That’s “only” &lt;strong&gt;10% overhead&lt;/strong&gt;.&lt;/p&gt;

&lt;h3 id=&quot;so-how-to-minimize-this-overhead&quot;&gt;So, how to minimize this overhead&lt;/h3&gt;

&lt;p&gt;The idea is to store the same data with less records.  How to do that?
Aggregating data in arrays.  The more records you put in a single array, the
less overhead you have.  And if you aggregate enough data, you can benefit
from transparent compression thanks to the &lt;a href=&quot;https://www.postgresql.org/docs/current/static/storage-toast.html&quot;&gt;TOAST
mechanism&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let’s try with a single 1 integer column table containing 10M rows:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;o&quot;&gt;#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;raw_1&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;o&quot;&gt;#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;raw_1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;generate_series&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10000000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;o&quot;&gt;#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INDEX&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;raw_1&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;The user data should need 10M * 4B, ie. around &lt;strong&gt;38MB&lt;/strong&gt;, while this table will
consume &lt;strong&gt;348MB&lt;/strong&gt;.  Inserting the data takes around &lt;strong&gt;23&lt;/strong&gt; seconds.&lt;/p&gt;

&lt;p class=&quot;notice&quot;&gt;&lt;strong&gt;NOTE:&lt;/strong&gt; If you do the maths, you’ll find out that the overhead is slighty
more than &lt;strong&gt;32B&lt;/strong&gt;, not &lt;strong&gt;23B&lt;/strong&gt;.  This is because each block also has some
overhead, NULL handling and alignement issue.  If you want more information
on this, I recommand to see
&lt;a href=&quot;https://github.com/dhyannataraj/tuple-internals-presentation&quot;&gt;this presentation&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let’s compare with aggregated versions of the same data:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;o&quot;&gt;#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;agg_1&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[]);&lt;/span&gt;

&lt;span class=&quot;o&quot;&gt;#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;agg_1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;array_agg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;generate_series&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10000000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;GROUP&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2000000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;o&quot;&gt;#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INDEX&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;agg_1&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;This will insert 5 elements per row.  I’ve done the same test with 20, 100, 200
and 1000 elements per row.  Results are below:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/images/tuple_overhead_1.svg&quot;&gt;&lt;img src=&quot;/images/tuple_overhead_1.svg&quot; alt=&quot;Benchmark 1&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p class=&quot;notice&quot;&gt;&lt;strong&gt;NOTE:&lt;/strong&gt; The size for 1000 element per row is a little higher than lower value.
This is because it’s the only one which is big enough to be TOAST-ed, but not
big enough to be compressed.  We can see a little TOAST overhead here.&lt;/p&gt;

&lt;p&gt;So far so good, we can see quite good improvements, both in size and INSERT
time even for very small arrays.  Let’s see the impact to retrieve rows.  I’ll
try to retrieve all the rows, then only one row with an index scan (for the
tests I’ve used EXPLAIN ANALYZE to minimize the time to represent the data in
psql):&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;o&quot;&gt;#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;raw_1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;o&quot;&gt;#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INDEX&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;raw_1&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;o&quot;&gt;#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;raw_1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;500&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;To properly index this array, we need a GIN index.  To get all the values from
aggregated data, we need to unnest() the arrays, and to be a little more
creative to get a single record:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;o&quot;&gt;#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;unnest&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;agg_1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;o&quot;&gt;#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INDEX&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;agg_1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;USING&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gin&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;o&quot;&gt;#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;unnest&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;agg_1&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;array&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;500&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;500&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Here’s the chart comparing index creation time and index size:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/images/tuple_overhead_2.svg&quot;&gt;&lt;img src=&quot;/images/tuple_overhead_2.svg&quot; alt=&quot;Benchmark 2&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The GIN index is a little more than twice the btree index, if I add the table
size, total size is almost the same as without aggregation.  That’s not a big
issue since this example is naive, we’ll see later how to avoid using GIN
index to keep total size low.  Also index is way slower to build, meaning that
INSERT will also be slower.&lt;/p&gt;

&lt;p&gt;Here’s the chart comparing the time to retrieve all rows and a single row:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/images/tuple_overhead_3.svg&quot;&gt;&lt;img src=&quot;/images/tuple_overhead_3.svg&quot; alt=&quot;Benchmark 3&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Getting all the rows is probably not an interesting example, but it’s
interesting to note that as soon as array contains enough elements it starts to
be faster than the same operation using the original table.  We also see that
getting only one element is much more faster than with the btree index, thanks
to GIN efficiency.  It’s not tested here, but since only btree index are
sorted, if you need to get a lot of data sorted, using a GIN index will require
an extra sort which will be way slower than a simple btree index scan.&lt;/p&gt;

&lt;h3 id=&quot;a-more-realistic-example&quot;&gt;A more realistic example&lt;/h3&gt;

&lt;p&gt;Now that we’ve seen the basics, let’s see how to go further: aggregating more
than one columns and avoid to use too much disk space (and slowdown at write
time) with a GIN index.  For this, I’ll present how
&lt;a href=&quot;https://powa.readthedocs.io/&quot;&gt;PoWA&lt;/a&gt; stores it’s data.&lt;/p&gt;

&lt;p&gt;For each datasource collected, two tables are used: one for the &lt;em&gt;historic and
aggregated&lt;/em&gt; data, and one the &lt;em&gt;current data&lt;/em&gt;.  These tables store data in a
custom type instead of plain columns. Let’s see the tables related to
&lt;strong&gt;pg_stat_statements&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;The custom type, basically all the counters present in pg_stat_statements and
the timestamp associated to this record:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;powa&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;\&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;d&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;powa_statements_history_record&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;Composite&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;type&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;&quot;public.powa_statements_history_record&quot;&lt;/span&gt;
       &lt;span class=&quot;k&quot;&gt;Column&lt;/span&gt;        &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;k&quot;&gt;Type&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Modifiers&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;---------------------+--------------------------+-----------&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt;                  &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;timestamp&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;time&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;zone&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;calls&lt;/span&gt;               &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;bigint&lt;/span&gt;                   &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;total_time&lt;/span&gt;          &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;double&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;precision&lt;/span&gt;         &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;                &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;bigint&lt;/span&gt;                   &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;shared_blks_hit&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;bigint&lt;/span&gt;                   &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;shared_blks_read&lt;/span&gt;    &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;bigint&lt;/span&gt;                   &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;shared_blks_dirtied&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;bigint&lt;/span&gt;                   &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;shared_blks_written&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;bigint&lt;/span&gt;                   &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;local_blks_hit&lt;/span&gt;      &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;bigint&lt;/span&gt;                   &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;local_blks_read&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;bigint&lt;/span&gt;                   &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;local_blks_dirtied&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;bigint&lt;/span&gt;                   &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;local_blks_written&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;bigint&lt;/span&gt;                   &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;temp_blks_read&lt;/span&gt;      &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;bigint&lt;/span&gt;                   &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;temp_blks_written&lt;/span&gt;   &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;bigint&lt;/span&gt;                   &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;blk_read_time&lt;/span&gt;       &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;double&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;precision&lt;/span&gt;         &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;blk_write_time&lt;/span&gt;      &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;double&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;precision&lt;/span&gt;         &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;The table for current data stores the pg_stat_statement unique identifier (queryid,
dbid, userid), and a record of counters:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;powa&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;\&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;d&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;powa_statements_history_current&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;Table&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;&quot;public.powa_statements_history_current&quot;&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;Column&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;              &lt;span class=&quot;k&quot;&gt;Type&lt;/span&gt;              &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Modifiers&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;---------+--------------------------------+-----------&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;bigint&lt;/span&gt;                         &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;dbid&lt;/span&gt;    &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;oid&lt;/span&gt;                            &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;userid&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;oid&lt;/span&gt;                            &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;record&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;powa_statements_history_record&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;The table for aggregated data contains the same unique identifier, an array of
records and some special fields:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;powa&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;\&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;d&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;powa_statements_history&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;Table&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;&quot;public.powa_statements_history&quot;&lt;/span&gt;
     &lt;span class=&quot;k&quot;&gt;Column&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;               &lt;span class=&quot;k&quot;&gt;Type&lt;/span&gt;               &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Modifiers&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;----------------+----------------------------------+-----------&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt;        &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;bigint&lt;/span&gt;                           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;dbid&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;oid&lt;/span&gt;                              &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;userid&lt;/span&gt;         &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;oid&lt;/span&gt;                              &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;coalesce_range&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tstzrange&lt;/span&gt;                        &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;records&lt;/span&gt;        &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;powa_statements_history_record&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;mins_in_range&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;powa_statements_history_record&lt;/span&gt;   &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;maxs_in_range&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;powa_statements_history_record&lt;/span&gt;   &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Indexes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;nv&quot;&gt;&quot;powa_statements_history_query_ts&quot;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gist&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;coalesce_range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;We also store the timestamp range (&lt;em&gt;coalesce_range&lt;/em&gt;) containing all aggregated
counters in the row, and the minimum and maximum values of each counter in two
dedicated records.  These extra fields doesn’t consume too much space, and
allows very efficient indexing and computation, based on the data access
pattern of the related application.&lt;/p&gt;

&lt;p&gt;This table is used to know how much ressource a query consumed on a given time
range.  The GiST index won’t be too big since it only indexes two small values
per X aggregated counters, and will find efficiently the rows matching a given
queryid and time range.&lt;/p&gt;

&lt;p&gt;Then, computing the resources consumed can be done efficiently, since the
pg_stat_statements counters are strictly monotonic.  The algorithm would be:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;if the row time range is entirely contained in the asked time range, we only
need to compute delta of summary record:
&lt;strong&gt;maxs_in_range.counter - mins_in_range.counter&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;if not (meaning only two rows for each queryid) we unnest the array, filter
out records that aren’t in the asked time range, keep first and last value
and compute for each counter the maximum minus the minimum.&lt;/li&gt;
&lt;/ul&gt;

&lt;p class=&quot;notice&quot;&gt;&lt;strong&gt;NOTE:&lt;/strong&gt; Actually, PoWA interface always unnest all records contained in the
asked time interval, since the interface is designed to show these counters
evolution on a relatively small time range, but with a great precision.
Hopefuly, unnesting the arrays is not that expensive, especially compared to
the disk space saved.&lt;/p&gt;

&lt;p&gt;And here’s the size needed for the aggregated and non aggregated values.  For
this I let PoWA generate &lt;strong&gt;12.331.366 records&lt;/strong&gt; (configuring a snapshot every 5
seconds for some hours, with default aggregation of 100 records per row), and
used a btree index on (queryid, ((record).ts) to simulate the index present on
the aggregated table:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/images/tuple_overhead_4.svg&quot;&gt;&lt;img src=&quot;/images/tuple_overhead_4.svg&quot; alt=&quot;Benchmark 4&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Pretty efficient, right?&lt;/p&gt;

&lt;h3 id=&quot;limitations&quot;&gt;Limitations&lt;/h3&gt;

&lt;p&gt;There are some limitations with aggregating records.  If you do this, you can’t
enforce constraints such as foreign keys or unique constraints.  The use is
therefore non-relationnal data, such as counters or metadata.&lt;/p&gt;

&lt;h3 id=&quot;bonus&quot;&gt;Bonus&lt;/h3&gt;

&lt;p&gt;Using custom types also allows some nice things, like defining &lt;strong&gt;custom
operators&lt;/strong&gt;.  For instance, the release 3.1.0 of PoWA provides two operators
for each custom type defined:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;the &lt;strong&gt;-&lt;/strong&gt; operator, to get difference between two record&lt;/li&gt;
  &lt;li&gt;the &lt;strong&gt;/&lt;/strong&gt; operator, to get the difference &lt;em&gt;per second&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can therefore do quite easily this kind of queries:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;o&quot;&gt;#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;record&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lag&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;record&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;over&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()).&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;powa_statements_history_current&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3589441560&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dbid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;16384&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;intvl&lt;/span&gt;      &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;calls&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;total_time&lt;/span&gt;    &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-----------------+--------+------------------+--------+ ...&lt;/span&gt;
 &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;          &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
 &lt;span class=&quot;mi&quot;&gt;00&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;00&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;05&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;004611&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;   &lt;span class=&quot;mi&quot;&gt;5753&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;20&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5570000000005&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;   &lt;span class=&quot;mi&quot;&gt;5753&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
 &lt;span class=&quot;mi&quot;&gt;00&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;00&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;05&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;004569&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;   &lt;span class=&quot;mi&quot;&gt;1879&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;40500000000047&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;   &lt;span class=&quot;mi&quot;&gt;1879&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
 &lt;span class=&quot;mi&quot;&gt;00&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;00&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;05&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;00477&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;mi&quot;&gt;14369&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;48&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;9060000000006&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;mi&quot;&gt;14369&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
 &lt;span class=&quot;mi&quot;&gt;00&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;00&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;05&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;00418&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;      &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;                &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;      &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;

&lt;span class=&quot;o&quot;&gt;#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;record&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lag&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;record&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;over&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()).&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;powa_statements_history_current&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;queryid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3589441560&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dbid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;16384&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

  &lt;span class=&quot;n&quot;&gt;sec&lt;/span&gt;   &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;calls_per_sec&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;runtime_per_sec&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rows_per_sec&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;--------+---------------+------------------+--------------+ ...&lt;/span&gt;
 &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;        &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;       &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
      &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;        &lt;span class=&quot;mi&quot;&gt;1150&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;  &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1114000000001&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;       &lt;span class=&quot;mi&quot;&gt;1150&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
      &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;         &lt;span class=&quot;mi&quot;&gt;375&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;28100000000009&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;        &lt;span class=&quot;mi&quot;&gt;375&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
      &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;        &lt;span class=&quot;mi&quot;&gt;2873&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;9&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;78120000000011&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;       &lt;span class=&quot;mi&quot;&gt;2873&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;If you’re interested on how to implement such operators, you can look at
&lt;a href=&quot;https://github.com/powa-team/powa-archivist/commit/203ed02a5205ad41ce0854bf0580779d7fb6193b#diff-efeed95efc180d43a149361145c2f082R1079&quot;&gt;PoWA implementation&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;You now know the basics to work around the per tuple overhead.  Depending on
your needs and your data specifities, you should find a way to aggregate your
data, maybe add some extra columns, to keep nice performance and spare some
disk space.&lt;/p&gt;

&lt;!--
Test 1, simple integer, 10M row

with s(id) AS (select unnest(id) from agg_1 where id &amp;&amp; array[500])
select * from s where id = 500;


raw_1 (id integer)
  insert: 23s
  size: 346 MB
  read data: 2.2s
  create index: 5.2s
  index size: 214 MB
  find 1 row: 1.4ms

agg_1 (id integer[])
  5 val per row
  INSERT INTO agg_1 SELECT array_agg(i) FROM generate_series(1,10000000) i GROUP BY i % 2000000 ;
  insert: 18s
  size: 146 MB (no toast)
  read raw data: 377 ms
  unnnest: 4s
  create (GIN) index: 73s
  index size: 478 MB
  find 1 val: 0.25ms

agg_1 (id integer[])
  20 val per row
  INSERT INTO agg_1 SELECT array_agg(i) FROM generate_series(1,10000000) i GROUP BY i % 500000 ;
  insert: 13s
  size: 64 MB (no toast)
  read raw data: 100ms
  read unnnest: 2.6 s
  create (GIN) index: 70s
  index size: 478MB
  find 1 val: 0.3ms

agg_1 (id integer[])
  100 val per row
  INSERT INTO agg_1 SELECT array_agg(i) FROM generate_series(1,10000000) i GROUP BY i % 100000;
  insert: 10s
  size: 43MB (notoast)
  read raw data: 31ms
  read unnnest: 2s
  create (GIN) index: 68s
  index size: 478 MB
  find 1 val: 0.45 ms

agg_1 (id integer[])
  200 val per row
  INSERT INTO agg_1 SELECT array_agg(i) FROM generate_series(1,10000000) i GROUP BY i % 50000;
  insert: 9.7s
  size: 43MB (notoast)
  read raw data: 21ms
  read unnnest: 2s
  create (GIN) index: 69s
  index size: 478MB
  find 1 val: 0.7ms

agg_1 (id integer[])
  1000 val per row
  INSERT INTO agg_1 SELECT array_agg(i) FROM generate_series(1,10000000) i GROUP BY i % 10000;
  insert: 10s
  size: 53MB (toast)
  read raw data: 7ms
  read unnnest: 2s
  create (GIN) index: 67s
  index size: 478MB
  find 1 val: 2,7ms
  --&gt;

    &lt;p&gt;&lt;a href=&quot;https://rjuju.github.io/postgresql/2016/09/16/minimizing-tuple-overhead.html&quot;&gt;Minimizing tuple overhead&lt;/a&gt; was originally published by Julien Rouhaud at &lt;a href=&quot;https://rjuju.github.io&quot;&gt;rjuju's home&lt;/a&gt; on September 16, 2016.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[Estimating Needed Memory for a Sort]]></title>
  <link rel="alternate" type="text/html" href="https://rjuju.github.io/postgresql/2015/08/18/estimating-needed-memory-for-a-sort.html" />
  <id>https://rjuju.github.io/postgresql/2015/08/18/estimating-needed-memory-for-a-sort</id>
  <published>2015-08-18T14:03:34+00:00</published>
  <updated>2015-08-18T14:03:34+00:00</updated>
  <author>
    <name>Julien Rouhaud</name>
    <uri>https://rjuju.github.io</uri>
    
  </author>
  <content type="html">
    &lt;h3 id=&quot;work_mem&quot;&gt;work_mem?&lt;/h3&gt;

&lt;p&gt;The work memory, or &lt;strong&gt;work_mem&lt;/strong&gt; is one of the most complicated setting to
configure. It can be used for various purposes. It’s mainly used when sorting
data or creating hash tables, but it can also be used by set returning
functions using a tuplestore for instance, like the &lt;strong&gt;generate_series()&lt;/strong&gt;
function. Moreover, each node of a query can use this amount of memory. Set
this parameter too low, and a lot of temporary files will be used, set it too
high and you may encounter errors, or even an Out Of Memory (OOM) depending on
your OS configuration.&lt;/p&gt;

&lt;p&gt;I’ll focus here on the amount of memory needed when sorting data, to help you
understand how much memory is required when PostgreSQL runs a sort operation.&lt;/p&gt;

&lt;h3 id=&quot;truth-is-out&quot;&gt;Truth is out&lt;/h3&gt;

&lt;p&gt;I sometimes hear people think that there is a correlation between the size of the
temporary files generated and the amount of memory that would have been needed
to perform the same sort entirely in memory.  It’s unfortunately wrong, you
can’t make any assumption on the value of work_mem based only on the size of a
sort temporary file.&lt;/p&gt;

&lt;p&gt;It’s because when the data to be sorted don’t fit in the allowed memory,
PostgreSQL will use different algorithms, either external sort or external
merge, which have a totally different space usage. In addition to work_mem
usage, a smaller temporary file can be used multiple times, with external merge
algorithm, for less disk usage and better performance. If you want more details
on this, the relevant source code is present in
&lt;a href=&quot;https://github.com/postgres/postgres/blob/master/src/backend/utils/sort/tuplesort.c&quot;&gt;tuplesort.c&lt;/a&gt;
and
&lt;a href=&quot;https://github.com/postgres/postgres/blob/master/src/backend/utils/sort/logtape.c&quot;&gt;logtapes.c&lt;/a&gt;.
As a brief introduction, the header of &lt;strong&gt;tuplesort.c&lt;/strong&gt; says:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;[…]
This module handles sorting of heap tuples, index tuples, or single
Datums (and could easily support other kinds of sortable objects,
if necessary).  It works efficiently for both small and large amounts
of data.  Small amounts are sorted in-memory using qsort().  Large
amounts are sorted using temporary files and a standard external sort
algorithm.&lt;/p&gt;

  &lt;p&gt;See Knuth, volume 3, for more than you want to know about the external
sorting algorithm.  We divide the input into sorted runs using replacement
selection, in the form of a priority tree implemented as a heap
(essentially his Algorithm 5.2.3H), then merge the runs using polyphase
merge, Knuth’s Algorithm 5.4.2D.  The logical “tapes” used by Algorithm D
are implemented by logtape.c, which avoids space wastage by recycling
disk space as soon as each block is read from its “tape”.
[…]&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;NOTE:&lt;/strong&gt; It’s an extract from the 9.5 version of the //readme//.  External
sorts are now using &lt;a href=&quot;https://github.com/postgres/postgres/commit/0711803775a&quot;&gt;a //quicksort// quicksort algorithm rather than a
//replacement
selection//&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It can be easily verified. First, let’s create a table and add some data:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sort&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;text&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sort&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'line '&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;||&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;generate_series&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;100000&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;To sort all these rows, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;7813kB&lt;/code&gt; is needed (more details later).  Let’s see the
EXPLAIN ANALYZE with work_mem set to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;7813kB&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;7812kB&lt;/code&gt;:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SET&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;work_mem&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;to&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'7813kB'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SET&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;EXPLAIN&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ANALYZE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sort&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
                                                    &lt;span class=&quot;n&quot;&gt;QUERY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PLAN&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-------------------------------------------------------------------------------------------------------------------&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;Sort&lt;/span&gt;  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cost&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;9845&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;82&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;..&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10095&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;82&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100000&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;width&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;14&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;actual&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;50&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;957&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;..&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;59&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;163&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100000&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;loops&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;Sort&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;Sort&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Method&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;quicksort&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;Memory&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;7813&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;kB&lt;/span&gt;
   &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;Seq&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Scan&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;on&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sort&lt;/span&gt;  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cost&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;00&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;..&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1541&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;00&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100000&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;width&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;14&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;actual&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;012&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;..&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;19&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;789&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100000&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;loops&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SET&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;work_mem&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;to&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'7812kB'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SET&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;EXPLAIN&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ANALYZE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sort&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
                                                    &lt;span class=&quot;n&quot;&gt;QUERY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PLAN&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-------------------------------------------------------------------------------------------------------------------&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;Sort&lt;/span&gt;  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cost&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;9845&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;82&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;..&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10095&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;82&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100000&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;width&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;14&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;actual&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;142&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;662&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;..&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;168&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;596&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100000&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;loops&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;Sort&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;Sort&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Method&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;external&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sort&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;Disk&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2432&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;kB&lt;/span&gt;
   &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;Seq&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Scan&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;on&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sort&lt;/span&gt;  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cost&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;00&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;..&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1541&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;00&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100000&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;width&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;14&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;actual&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;027&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;..&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;18&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;621&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100000&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;loops&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;So, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;7813kB&lt;/code&gt; are needed, but if we lack only &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1kB&lt;/code&gt;, the temporary file size
is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;2432kB&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;You can also activate the trace_sort parameter to have some more information:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SET&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;trace_sort&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TO&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;on&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SET&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SET&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;client_min_messages&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SET&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;EXPLAIN&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ANALYZE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sort&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;LOG&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;  &lt;span class=&quot;k&quot;&gt;begin&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tuple&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sort&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nkeys&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;workMem&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;7812&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;randomAccess&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;LOG&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;switching&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;to&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;external&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sort&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;28&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tapes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CPU&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;00&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;05&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;u&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sec&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;elapsed&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;05&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sec&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;LOG&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;performsort&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;starting&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CPU&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;00&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;07&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;u&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sec&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;elapsed&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;07&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sec&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;LOG&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;finished&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;writing&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;run&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;to&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tape&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CPU&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;00&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;15&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;u&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sec&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;elapsed&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;15&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sec&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;LOG&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;performsort&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;done&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CPU&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;00&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;15&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;u&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sec&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;elapsed&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;15&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sec&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;LOG&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;  &lt;span class=&quot;k&quot;&gt;external&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sort&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ended&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;304&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;disk&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;blocks&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;used&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CPU&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;00&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;18&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;u&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sec&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;elapsed&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;19&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sec&lt;/span&gt;
                                                    &lt;span class=&quot;n&quot;&gt;QUERY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PLAN&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-------------------------------------------------------------------------------------------------------------------&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;Sort&lt;/span&gt;  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cost&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;9845&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;82&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;..&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10095&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;82&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100000&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;width&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;14&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;actual&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;154&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;751&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;..&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;181&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;724&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100000&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;loops&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;Sort&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;Sort&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Method&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;external&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sort&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;Disk&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2432&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;kB&lt;/span&gt;
   &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;Seq&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Scan&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;on&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sort&lt;/span&gt;  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cost&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;00&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;..&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1541&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;00&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100000&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;width&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;14&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;actual&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;039&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;..&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;23&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;712&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100000&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;loops&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;With these data, 28 tapes are used.&lt;/p&gt;

&lt;h3 id=&quot;so-how-do-i-know-how-much-work_mem-is-needed&quot;&gt;So, how do I know how much work_mem is needed?&lt;/h3&gt;

&lt;p&gt;First, you need to know that all the data will be allocated through PostgreSQL’s
allocator &lt;strong&gt;AllocSet&lt;/strong&gt;. If you want to know more about it, I recommend to read
the excellent articles Tomas Vondras wrote on this topic: &lt;a href=&quot;http://blog.pgaddict.com/posts/introduction-to-memory-contexts&quot;&gt;Introduction to
memory
contexts&lt;/a&gt;,
&lt;a href=&quot;http://blog.pgaddict.com/posts/allocation-set-internals&quot;&gt;Allocation set
internals&lt;/a&gt; and &lt;a href=&quot;http://blog.pgaddict.com/posts/palloc-overhead-examples&quot;&gt;palloc
overhead examples&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The needed information here is that the allocator adds some overhead. Each
allocated block has a fixed overhead of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;16B&lt;/code&gt;, and the memory size requested
(without the 16B overhead) will be rounded up to a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;2^N&lt;/code&gt; size. So if you ask
for 33B, 80B will be used: 16B of overhead and 64B, the closest 2^N multiple.
The work_mem will be used to store every row, and some more information.&lt;/p&gt;

&lt;p&gt;For each row to sort, a fixed amount of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;24B&lt;/code&gt; memory will be used. This is the
size of a &lt;strong&gt;SortTuple&lt;/strong&gt; which is the structure sorted. This amount of memory
will be allocated in a single block, so we have only &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;24B&lt;/code&gt; overhead (fixed 8B
and the 16B to go to the closest 2^N multiple).&lt;/p&gt;

&lt;p&gt;The first part of the formula is therefore:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;mi&quot;&gt;24&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;24&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;(n being the number of tuple sorted)&lt;/p&gt;

&lt;p&gt;Then, you have to know that PostgreSQL will preallocate this space for 1024
rows. So you’ll never see a memory consumption of 2 or 3kB.&lt;/p&gt;

&lt;p&gt;Then, each SortTuple will then contain a
&lt;strong&gt;MinimalTuple&lt;/strong&gt;, which is basically a tuple without the system metadata (xmin,
xmax…), or an &lt;strong&gt;IndexTuple&lt;/strong&gt; if the tuples come from an index scan. This
structure will be allocated separately for each tuple, so there can be a pretty
big overhead. Theses structures lengths are both &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;6B&lt;/code&gt;, but need to be aligned.
This represents &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;16B&lt;/code&gt; per tuple.&lt;/p&gt;

&lt;p&gt;These structures will also contain the entire row, the size depends on the
table, and the content for variable length columns.&lt;/p&gt;

&lt;p&gt;The second part of the formula is therefore:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;16&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;average&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;row&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;length&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rounded&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;to&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;^&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;N&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;We can now estimate how much memory is needed:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;24&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;16&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;average&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;row&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;length&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rounded&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;to&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;^&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;N&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;24&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;h3 id=&quot;testing-the-formula&quot;&gt;Testing the formula&lt;/h3&gt;

&lt;p&gt;Let’s see on our table. It contains two fields, &lt;strong&gt;id&lt;/strong&gt; and &lt;strong&gt;val&lt;/strong&gt;. &lt;strong&gt;id&lt;/strong&gt; is an
integer, so it uses &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;4B&lt;/code&gt;. The &lt;strong&gt;val&lt;/strong&gt; column is variable length. First, figure
out the estimated average row size:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stawidth&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_statistic&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;starelid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'sort'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;regclass&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;staattnum&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;stawidth&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;----------&lt;/span&gt;
       &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Just to be sure, as I didn’t do any ANALYZE on the table:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;avg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;length&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sort&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;avg&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;--------------------&lt;/span&gt;
 &lt;span class=&quot;mi&quot;&gt;9&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;8889500000000000&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;So, the average row size is approximatively &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;14B&lt;/code&gt;. PostgreSQL showed the same
estimation on the previous EXPLAIN plan, the reported width was 14:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;Sort&lt;/span&gt;  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cost&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;9845&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;82&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;..&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10095&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;82&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100000&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;width&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;14&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[...]&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p class=&quot;notice&quot;&gt;&lt;strong&gt;NOTE:&lt;/strong&gt; It’s better to rely on the pg_statistic, because it’s faster and
doesn’t consume resources.  Also, if you have large fields, they’ll be toasted,
and only a pointer will be stored in work_mem, not the entire field&lt;/p&gt;

&lt;p&gt;We add the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;16B&lt;/code&gt; overhead for the &lt;strong&gt;MinimalTuple&lt;/strong&gt; structure and get &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;30B&lt;/code&gt;. This will lead to an allocated space of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;32B&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Finally, the table contains 100.000 tuples, we can now compute the memory
needed :&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;    &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;24&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;16&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;100000&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;24&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;8000024&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;B&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;7812&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;52&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;kB&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;We now find the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;7813kB&lt;/code&gt; I announced earlier!&lt;/p&gt;

&lt;p&gt;This is a very simple example. If you only sort some of the rows, the estimated
size can be too high or too low if the rows you sort don’t match the average
size.&lt;/p&gt;

&lt;p&gt;Also, note that if the data length of a row exceed &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;8kB&lt;/code&gt; (not counting the
toasted data), the allocated size won’t be rounded up to the next 2^N multiple.&lt;/p&gt;

&lt;h3 id=&quot;wait-what-about-nulls&quot;&gt;Wait, what about NULLs?&lt;/h3&gt;

&lt;p&gt;Yes, this formula was way too simple…&lt;/p&gt;

&lt;p&gt;The formula assume you don’t have any NULL field, so it compute the &lt;strong&gt;maximum
estimated&lt;/strong&gt; memory needed.&lt;/p&gt;

&lt;p&gt;A NULL field won’t consume space for data, obviously, but will add a bit in a
bitmap stored in the MinimalTuple.&lt;/p&gt;

&lt;p&gt;If at least one field of a tuple is NULL, the bitmap will be created. Its size
is:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;number&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;of&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;attribute&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bytes&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rounded&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;down&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;So, if a tuple has 3 integer fields, and two of them are NULL, the data size will not be &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;16B&lt;/code&gt; but:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;B&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;You can then try to estimate a better size with the statistic NULL fractions of
each attribute, available in &lt;strong&gt;pg_statistics&lt;/strong&gt;.&lt;/p&gt;

&lt;h3 id=&quot;for-the-lazy-ones&quot;&gt;For the lazy ones&lt;/h3&gt;

&lt;p&gt;Here’s a simple query that will do the maths for you. It assumes:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;only fields from one table is sorted&lt;/li&gt;
  &lt;li&gt;there are no NULL&lt;/li&gt;
  &lt;li&gt;all the rows will be sorted&lt;/li&gt;
  &lt;li&gt;statistics are accurate&lt;/li&gt;
&lt;/ul&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;RECURSIVE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;overhead&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;UNION&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ALL&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;overhead&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4096&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;width&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;starelid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;stawidth&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;sum&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_statistic&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;GROUP&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;num_of_lines&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;relid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n_live_tup&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_stat_user_tables&lt;/span&gt;

&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_size_pretty&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(((&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;24&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;16&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;max&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;min&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)))&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;24&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;overhead&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CROSS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_class&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_namespace&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;relnamespace&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;oid&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;width&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;w&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;w&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;starelid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;oid&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;num_of_lines&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nol&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;relid&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;oid&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;relname&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'sort'&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nspname&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'public'&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;w&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;sum&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;pg_size_pretty&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;----------------&lt;/span&gt;
 &lt;span class=&quot;mi&quot;&gt;7813&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kB&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;h3 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;Now, you know the basics to estimate the amount of memory you need to sort your
data.&lt;/p&gt;

&lt;p&gt;A minimal example was presented here for a better understanding, things start to
get really complicated when you don’t only sort all the rows of a single table
but the result of some joins and filters.&lt;/p&gt;

&lt;p&gt;I hope you’ll have fun tuning work_mem on your favorite cluster. But don’t
forget, work_mem is used for more than just sorting tuples!&lt;/p&gt;


    &lt;p&gt;&lt;a href=&quot;https://rjuju.github.io/postgresql/2015/08/18/estimating-needed-memory-for-a-sort.html&quot;&gt;Estimating Needed Memory for a Sort&lt;/a&gt; was originally published by Julien Rouhaud at &lt;a href=&quot;https://rjuju.github.io&quot;&gt;rjuju's home&lt;/a&gt; on August 18, 2015.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[Keep an eye on your PostgreSQL configuration]]></title>
  <link rel="alternate" type="text/html" href="https://rjuju.github.io/postgresql/2015/07/22/keep-an-eye-on-your-postgresql-configuration.html" />
  <id>https://rjuju.github.io/postgresql/2015/07/22/keep-an-eye-on-your-postgresql-configuration</id>
  <published>2015-07-22T10:48:16+00:00</published>
  <updated>2015-07-22T10:48:16+00:00</updated>
  <author>
    <name>Julien Rouhaud</name>
    <uri>https://rjuju.github.io</uri>
    
  </author>
  <content type="html">
    &lt;p&gt;Have you ever wished to know what configuration changed during the last weeks,
when everything was so much faster, or wanted to check what happened on your
beloved cluster while you were in vacation?&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/rjuju/pg_track_settings&quot;&gt;pg_track_settings&lt;/a&gt; is a simple,
SQL only extension that helps you to know all of that and more very easily.  As
it’s designed as an extension, it requires PostgreSQL 9.1 or more.&lt;/p&gt;

&lt;h3 id=&quot;some-insights&quot;&gt;Some insights&lt;/h3&gt;

&lt;p&gt;As amost any extension, you have to compile it from source, or use the &lt;a href=&quot;http://pgxnclient.projects.pgfoundry.org/&quot;&gt;pgxn
client&lt;/a&gt;, since there’s no package
yet.  Assuming you just extract the tarball of the release 1.0.0 with a typical
server configuration:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;cd &lt;/span&gt;pg_track_settings-1.0.0
&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;make &lt;span class=&quot;nb&quot;&gt;install&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Then the extension is available.  Create the extension on the database of your choice:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;postgres&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;EXTENSION&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_track_settings&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;EXTENSION&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;In order to historize the settings, you need to schedule a simple function call
on a regular basis.  This function is the &lt;strong&gt;pg_track_settings_snapshot&lt;/strong&gt;
function.  It’s really cheap to call, and won’t have any measurable impact on
your cluster.  This function will do all the smart work of storing all the
parameters &lt;strong&gt;that changed since the last call&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For instance, if you want to be able to know what changed on your server within
a 5 minutes accuracy, a simple cron entry like this for the postgres user is
enough:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span class=&quot;k&quot;&gt;*&lt;/span&gt;/5 &lt;span class=&quot;k&quot;&gt;*&lt;/span&gt;  &lt;span class=&quot;k&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;*&lt;/span&gt;     psql &lt;span class=&quot;nt&quot;&gt;-c&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;SELECT pg_track_settings_snapshot()&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; /dev/null 2&amp;gt;&amp;amp;1&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;A background worker could be used on PostgreSQL 9.3 and more, but as we only
have to call one function every few minutes, it’d be overkill to add one just
for this.  If you really want one, you’d better consider settting up
&lt;a href=&quot;https://powa.readthedocs.io/&quot;&gt;PoWA&lt;/a&gt; for that, or another extension that
allows to run task like &lt;a href=&quot;http://www.pgadmin.org/docs/dev/pgagent.html&quot;&gt;pgAgent&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;how-to-use-it&quot;&gt;How to use it&lt;/h3&gt;

&lt;p&gt;Let’s call the snapshot function to get ti initial values:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;postgres&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;select&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_track_settings_snapshot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
 &lt;span class=&quot;c1&quot;&gt;----------------------------&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;row&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;A first snapshot with the initial settings values is saved.  Now, I’ll just
change a setting in the &lt;strong&gt;postgresql.conf&lt;/strong&gt; file (&lt;strong&gt;ALTER SYSTEM&lt;/strong&gt; could also
be used on a PostgreSQL 9.4 or more release), reload the configuration and take
another snapshot:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;postgres&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;select&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_reload_conf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;pg_reload_conf&lt;/span&gt;
 &lt;span class=&quot;c1&quot;&gt;----------------&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;row&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;postgres&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;select&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_track_settings_snapshot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;pg_track_settings_snapshot&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;----------------------------&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;row&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Now, the fun part.  What information is available?&lt;/p&gt;

&lt;p&gt;First, what changed between two timestamp. For instance, let’s check what
changed in the last 2 minutes:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;postgres&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_track_settings_diff&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;now&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;interval&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'2 minutes'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;now&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;());&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;         &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;from_setting&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;from_exists&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;to_setting&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;to_exists&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;---------------------+--------------|-------------|------------|----------&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;max_wal_size&lt;/span&gt;        &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;93&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;31&lt;/span&gt;         &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;row&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;What do we learn ?&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;as the max_wal_size parameter exists, I’m using the 9.5 alpha release.
Yes, what PostgreSQL really needs right now is people testing the upcoming
release!  It’s simple, and the more people test it, the faster it’ll be
avalable.  See the &lt;a href=&quot;https://wiki.postgresql.org/wiki/HowToBetaTest&quot;&gt;how to&lt;/a&gt;
page to see how you can help :)&lt;/li&gt;
  &lt;li&gt;the max_wal_size parameter existed 2 minutes ago (&lt;strong&gt;from_exists&lt;/strong&gt; is
true), and also exists right now (&lt;strong&gt;to_exists&lt;/strong&gt; is true).  Obviously, the
regular settings will not disappear, but think of extension related
settings like pg_stat_statements.* or auto_explain.*&lt;/li&gt;
  &lt;li&gt;the max_wal_size changed from &lt;strong&gt;93&lt;/strong&gt; (&lt;strong&gt;from_setting&lt;/strong&gt;) to &lt;strong&gt;31&lt;/strong&gt;
(&lt;strong&gt;to_setting&lt;/strong&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Also, we can get the history of a specific setting:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;postgres&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_track_settings_log&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;'max_wal_size'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
              &lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt;               &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;     &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;setting_exists&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;setting&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-------------------------------+--------------+----------------+---------&lt;/span&gt;
 &lt;span class=&quot;mi&quot;&gt;2015&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;07&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;17&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;22&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;42&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;01&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;156948&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;02&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;max_wal_size&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;              &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;31&lt;/span&gt;
 &lt;span class=&quot;mi&quot;&gt;2015&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;07&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;17&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;22&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;38&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;02&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;722206&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;02&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;max_wal_size&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;              &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;93&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;You can also retrieve the entire configuration at a specified timestamp.  For
instance:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;postgres&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_track_settings&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;'2015-07-17 22:40:00'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;                 &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;     &lt;span class=&quot;n&quot;&gt;setting&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-------------------------------------+-----------------&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;[...]&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;max_wal_senders&lt;/span&gt;                     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;max_wal_size&lt;/span&gt;                        &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;93&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;max_worker_processes&lt;/span&gt;                &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;[...]&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;The sames functions are provided to know what settings have been overloaded for
a specific user and/or database (the &lt;strong&gt;ALTER ROLE … SET&lt;/strong&gt;, &lt;strong&gt;ALTER ROLE …
IN DATABASE … SET&lt;/strong&gt; and &lt;strong&gt;ALTER DATABASE … SET&lt;/strong&gt; commands), with the
functions:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;pg_track_db_role_settings_diff()&lt;/li&gt;
  &lt;li&gt;pg_track_db_role_settings_log()&lt;/li&gt;
  &lt;li&gt;pg_track_db_role_settings()&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And finally, just in case you can also know when PostgreSQL has been restarted:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;postgres&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_reboot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
              &lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-------------------------------&lt;/span&gt;
 &lt;span class=&quot;mi&quot;&gt;2015&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;07&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;17&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;08&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;39&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;37&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;315131&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;02&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;row&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;That’s all for this extension.  I hope you’ll never miss or forget a
configuration change again!&lt;/p&gt;

&lt;p&gt;If you want to install it, the source code is available on the github
repository
&lt;a href=&quot;https://github.com/rjuju/pg_track_settings&quot;&gt;github.com/rjuju/pg_track_settings&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;limitations&quot;&gt;Limitations&lt;/h3&gt;

&lt;p&gt;As the only way to know what is the current value for a setting is to query
pg_settings (or call current_setting()), you must be aware that the user
calling &lt;strong&gt;pg_track_settings_snapshot()&lt;/strong&gt; may see an overloaded value (like
ALTER ROLE … SET param = value) rather than the original value.  As the
&lt;strong&gt;pg_db_role_setting&lt;/strong&gt; table is also historized, it’s pretty easy to know
that you don’t see the original value, but there’s no way to know &lt;strong&gt;what&lt;/strong&gt; the
original value really is.&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://rjuju.github.io/postgresql/2015/07/22/keep-an-eye-on-your-postgresql-configuration.html&quot;&gt;Keep an eye on your PostgreSQL configuration&lt;/a&gt; was originally published by Julien Rouhaud at &lt;a href=&quot;https://rjuju.github.io&quot;&gt;rjuju's home&lt;/a&gt; on July 22, 2015.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[How About Hypothetical Indexes ?]]></title>
  <link rel="alternate" type="text/html" href="https://rjuju.github.io/postgresql/2015/07/02/how-about-hypothetical-indexes.html" />
  <id>https://rjuju.github.io/postgresql/2015/07/02/how-about-hypothetical-indexes</id>
  <published>2015-07-02T10:08:03+00:00</published>
  <updated>2015-07-02T10:08:03+00:00</updated>
  <author>
    <name>Julien Rouhaud</name>
    <uri>https://rjuju.github.io</uri>
    
  </author>
  <content type="html">
    &lt;p&gt;After so much time missing this features,
&lt;a href=&quot;https://github.com/HypoPG/hypopg&quot;&gt;HypoPG&lt;/a&gt; implements hypothetical indexes
support for PostgreSQl, available as an extension.&lt;/p&gt;

&lt;h3 id=&quot;introduction&quot;&gt;Introduction&lt;/h3&gt;

&lt;p&gt;It’s now been some time since the second version of
&lt;a href=&quot;https://powa.readthedocs.io/&quot;&gt;PoWA&lt;/a&gt; has been announced. One of the new feature
of this version is the
&lt;a href=&quot;https://github.com/powa-team/pg_qualstats&quot;&gt;pg_qualstats&lt;/a&gt; extension, written
by &lt;a href=&quot;https://rdunklau.github.io&quot;&gt;Ronan Dunklau&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Thanks to this extension, we can now gather real-time statistics to detect
missing indexes, and much more (if you’re interested in this extension, you
should read &lt;a href=&quot;http://rdunklau.github.io/postgresql/powa/pg_qualstats/2015/02/02/pg_qualstats_part1/&quot;&gt;Ronan’s article about
pg_qualstats&lt;/a&gt;).
And used with PoWA, you have an interface that allows you to find the most
consuming queries, and will suggest you the missing indexes if they’re needed.&lt;/p&gt;

&lt;p&gt;That’s really nice, but now a lot of people come with this natural question:
&lt;strong&gt;Ok, you say that I should create this index, but will PostgreSQL eventually
use it ?&lt;/strong&gt;. That’s actually a good question, because depending on many
parameters (in many other things), PostgreSQL could choose to just ignore your
freshly created index.  That could be a really bad surprise, especially if you
had to wait many hours to have it built.&lt;/p&gt;

&lt;h3 id=&quot;hypothetical-indexes&quot;&gt;Hypothetical Indexes&lt;/h3&gt;

&lt;p&gt;So yes, the answer to this question is &lt;strong&gt;hypothetical indexes support&lt;/strong&gt;. That’s
really not a new idea, a lot of popular RDBMS support them.&lt;/p&gt;

&lt;p&gt;There has already been some previous work on this several years ago, presented
at &lt;a href=&quot;http://www.pgcon.org/2010/schedule/events/233.en.html&quot;&gt;pgCon 2010&lt;/a&gt;, which
was implementing much more than hypothetical indexes, but this was a research
work, which means that we never saw those features coming up in PostgreSQL.
This great work is only available as a fork of a few specific PostgreSQL
versions, the most recent being 9.0.1.&lt;/p&gt;

&lt;h3 id=&quot;lightweight-implementation-hypopg&quot;&gt;lightweight implementation: HypoPG&lt;/h3&gt;

&lt;p&gt;I had quite a different approach in HypoPG to implement hypothetical indexes
support.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;first of all, it must be completely pluggable. It’s available as an
extension and can be used (for now) on any 9.2 or higher PostgreSQL server.&lt;/li&gt;
  &lt;li&gt;it must be as non intrusive as it’s possible. It’s usable as soon as you
create the extension, without restart. Also, each backend has its own set of
hypothetical indexes, which mean that adding an hypothetical index will not
disturb other connections. Also, the hypothetical indexes are stored in memory,
adding/removing a huge amount of them will not bloat your system catalog.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The only restriction in implementing such a feature as an extension is that you
can’t change the syntax without modifying the PostgreSQL source code. So,
everything has to be done through user defined functions, and change regular
behaviour of existing functionnalities, like the EXPLAIN command. We’ll study
the details later.&lt;/p&gt;

&lt;h3 id=&quot;features&quot;&gt;Features&lt;/h3&gt;

&lt;p&gt;For now, the following functions are available:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;hypopg()&lt;/strong&gt;: return the list of hypothetical indexes (in a
similar way as pg_index).&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;hypopg_add_index(schema, table, attribute, access_method)&lt;/strong&gt;: create a
1-column-only hypothetical index.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;hypopg_create_index(query)&lt;/strong&gt;: create an hypothetical index using a
standard CREATE INDEX statement.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;hypopg_drop_index(oid)&lt;/strong&gt;: remove the specified hypothetical index.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;hypopg_list_indexes()&lt;/strong&gt;: return a short human readable version list
of available hypothetical indexes.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;hypopg_relation_size(oid)&lt;/strong&gt;: return the estimated size of an
hypothetical index&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;hypopg_reset()&lt;/strong&gt;: remove all hypothetical indexes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If some hypothetical indexes exists for some relations used in an EXPLAIN
(without ANALYZE) statement, they will automatically be added to the list of
real indexes. PostgreSQL will then choose to use them or not.&lt;/p&gt;

&lt;h3 id=&quot;usage&quot;&gt;Usage&lt;/h3&gt;

&lt;p&gt;Installing HypoPG is quite simple. Assuming you downloaded and extracted a
tarball in the hypopg-0.0.1 directory, are using a packaged version of
PostgreSQL and have -dev packages:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;cd &lt;/span&gt;hypopg-0.0.1
&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;make
&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;make &lt;span class=&quot;nb&quot;&gt;install&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Then HypoPG should be available:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;EXTENSION&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hypopg&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;EXTENSION&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Let’s try some really simple tests. First, create a small table:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;testable&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'line '&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;||&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;generate_series&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1000000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;100000&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ANALYZE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;testable&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;ANALYZE&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Then, let’s see a query plan that should benefit an index that’s not here:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;EXPLAIN&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;testable&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
                          &lt;span class=&quot;n&quot;&gt;QUERY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PLAN&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;---------------------------------------------------------------&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;Seq&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Scan&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;on&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;testable&lt;/span&gt;  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cost&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;00&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;..&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;17906&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;00&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;916&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;width&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;15&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;Filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;No surprise, a sequential scan is the only way to go. Now, let’s try to add
an hypothetical index, and EXPLAIN again:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hypopg_create_index&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;'CREATE INDEX ON testable (id)'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;hypopg_create_index&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;---------------------&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;row&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;nb&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;753&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ms&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;EXPLAIN&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;testable&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
                                          &lt;span class=&quot;n&quot;&gt;QUERY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PLAN&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-----------------------------------------------------------------------------------------------&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;Index&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Scan&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;41079&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;btree_testable_id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;on&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;testable&lt;/span&gt;  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cost&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;30&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;..&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;28&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;33&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;916&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;width&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;15&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;Index&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Cond&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Yeah! Our hypothetical index is used. We also notice that the hypothetical
index creation is more or less 1ms, which is way less than the real index
creation would have last.&lt;/p&gt;

&lt;p&gt;And of course, this hypothetical index is not used in an EXPLAIN ANALYZE:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;EXPLAIN&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ANALYZE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;testable&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
                                                 &lt;span class=&quot;n&quot;&gt;QUERY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PLAN&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-------------------------------------------------------------------------------------------------------------&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;Seq&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Scan&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;on&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;testable&lt;/span&gt;  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cost&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;00&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;..&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;17906&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;00&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;916&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;width&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;15&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;actual&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;076&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;..&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;234&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;218&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;999&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;loops&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;Filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;Rows&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Removed&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;by&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;999001&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;Planning&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;083&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ms&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;Execution&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;234&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;377&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ms&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Now let’s go further:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;EXPLAIN&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;testable&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;LIKE&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'line 100000%'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
                                         &lt;span class=&quot;n&quot;&gt;QUERY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PLAN&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;---------------------------------------------------------------------------------------------&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;Index&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Scan&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;41079&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;btree_testable_id&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;on&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;testable&lt;/span&gt;  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cost&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;30&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;..&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;30&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;62&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;width&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;15&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;Index&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Cond&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;Filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;~~&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'line 100000%'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;text&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Our hypothetical index is still used, but an index on &lt;strong&gt;id&lt;/strong&gt; and &lt;strong&gt;val&lt;/strong&gt; should
help this query. Also, as there’s a wildcard on the right-side of the LIKE
pattern, the operator class text_pattern_ops is needed. Let’s check that:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hypopg_create_index&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;'CREATE INDEX ON testable (id, val text_pattern_ops)'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;hypopg_create_index&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;---------------------&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;row&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;nb&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;194&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ms&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;EXPLAIN&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;testable&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;LIKE&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'line 100000%'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
                                              &lt;span class=&quot;n&quot;&gt;QUERY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PLAN&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;------------------------------------------------------------------------------------------------------&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;Index&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;Only&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Scan&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;41080&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;btree_testable_id_val&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;on&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;testable&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;on&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;testable&lt;/span&gt;  &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cost&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;30&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;..&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;26&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;76&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;width&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;15&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;Index&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Cond&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;~&amp;gt;=~&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'line 100000'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;text&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;~&amp;lt;~&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'line 100001'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;text&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;Filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;~~&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'line 100000%'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;text&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;And yes, PostgreSQL decides to use our new index!&lt;/p&gt;

&lt;h3 id=&quot;index-size-estimation&quot;&gt;Index size estimation&lt;/h3&gt;

&lt;p&gt;For now, the index size estimation is done quickly, which can give us a clue on what
would be the real index size.&lt;/p&gt;

&lt;p&gt;Let’s check the estimated size of our two hypothetical indexes:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;indexname&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pg_size_pretty&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hypopg_relation_size&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;indexrelid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hypopg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
           &lt;span class=&quot;n&quot;&gt;indexname&lt;/span&gt;           &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_size_pretty&lt;/span&gt; 
&lt;span class=&quot;c1&quot;&gt;-------------------------------+----------------&lt;/span&gt;
 &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;41080&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;btree_testable_id&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;25&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MB&lt;/span&gt;
 &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;41079&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;btree_testable_id_val&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;49&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MB&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;rows&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Now, create the real indexes, and compare the sizes:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INDEX&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;testable&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INDEX&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1756&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;001&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ms&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INDEX&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;testable&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;text_pattern_ops&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INDEX&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2179&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;185&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ms&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;relname&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pg_size_pretty&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pg_relation_size&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;oid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;rjuju&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=#&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_class&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;relkind&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'i'&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;relname&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;LIKE&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'%testable%'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
       &lt;span class=&quot;n&quot;&gt;relname&lt;/span&gt;       &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pg_size_pretty&lt;/span&gt; 
&lt;span class=&quot;c1&quot;&gt;---------------------+----------------&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;testable_id_idx&lt;/span&gt;     &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;21&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MB&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;testable_id_val_idx&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;30&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MB&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;The estimated index size is a bit higher than the real size. It’s on purpose.
If the estimated index size is less than an existing index, PostgreSQL would
prefer to use the hypothetical index than the real index, which is definitively
not interesting. Also, to simulate a bloated index (which is quite frequent on
real indexes), a hardcoded 20% bloat factor is added. Finally, the estimation could
also be improved a lot.&lt;/p&gt;

&lt;h3 id=&quot;limitations&quot;&gt;Limitations&lt;/h3&gt;

&lt;p&gt;This 0.0.1 version of HypoPG is still a work in progress, and a lot of work
is still needed.&lt;/p&gt;

&lt;p&gt;Here are the main limitations (at least that I’m aware of):&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;only btree hypothetical indexes are supported&lt;/li&gt;
  &lt;li&gt;no hypothetical indexes on expression&lt;/li&gt;
  &lt;li&gt;no hypothetical indexes on predicate&lt;/li&gt;
  &lt;li&gt;tablespace specification is not possible&lt;/li&gt;
  &lt;li&gt;index size estimation could be improved, and it’s not possible to change
the bloat factor&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, I believe it can already be helpful.&lt;/p&gt;

&lt;h3 id=&quot;whats-next-&quot;&gt;What’s next ?&lt;/h3&gt;

&lt;p&gt;Now, the next step is to implement HypoPG support in
&lt;a href=&quot;https://powa.readthedocs.io/&quot;&gt;PoWA&lt;/a&gt;, to help DBA decide wether they should
create the suggested index or not, and remove the current limitations.&lt;/p&gt;

&lt;p&gt;If you want to try HypoPG, here is the github repository:
&lt;a href=&quot;https://github.com/HypoPG/hypopg&quot;&gt;github.com/HypoPG/hypopg&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Stay tuned!&lt;/p&gt;

    &lt;p&gt;&lt;a href=&quot;https://rjuju.github.io/postgresql/2015/07/02/how-about-hypothetical-indexes.html&quot;&gt;How About Hypothetical Indexes ?&lt;/a&gt; was originally published by Julien Rouhaud at &lt;a href=&quot;https://rjuju.github.io&quot;&gt;rjuju's home&lt;/a&gt; on July 02, 2015.&lt;/p&gt;
  </content>
</entry>


<entry>
  <title type="html"><![CDATA[Talking About OPM and PoWA at pgconf.ru]]></title>
  <link rel="alternate" type="text/html" href="https://rjuju.github.io/postgresql/2015/03/18/talking-about-opm-and-powa-at-pgconf-ru.html" />
  <id>https://rjuju.github.io/postgresql/2015/03/18/talking-about-opm-and-powa-at-pgconf-ru</id>
  <published>2015-03-18T08:09:10+00:00</published>
  <updated>2015-03-18T08:09:10+00:00</updated>
  <author>
    <name>Julien Rouhaud</name>
    <uri>https://rjuju.github.io</uri>
    
  </author>
  <content type="html">
    &lt;p&gt;Last month, I had the chance to talk about PostgreSQL monitoring, and present
some of the tools I’m working on at &lt;a href=&quot;http://en.pgconf.ru/&quot;&gt;pgconf.ru&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This talk was a good opportunity to work on an overview of existing projects
dealing with monitoring or performance, see what may be lacking and what can be
done to change this situation.&lt;/p&gt;

&lt;p&gt;Here are my slides:&lt;/p&gt;

&lt;iframe src=&quot;//www.slideshare.net/slideshow/embed_code/45811083&quot; width=&quot;476&quot; height=&quot;400&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot;&gt;&lt;/iframe&gt;

&lt;p&gt;If you’re interested in this topic, or if you developped a tool I missed while
writing these slides (my apologies if it’s the case),
&lt;a href=&quot;https://wiki.postgresql.org/wiki/Monitoring&quot;&gt;the official wiki page&lt;/a&gt;
is the place you should go first.&lt;/p&gt;

&lt;p&gt;I’d also like to thank all the &lt;a href=&quot;http://en.pgconf.ru/&quot;&gt;pgconf.ru&lt;/a&gt; staff for their
work, this conference was a big success, and &lt;a href=&quot;https://twitter.com/DenishPatel/status/563710192672849920&quot;&gt;the biggest postgresql-centric event
ever organized&lt;/a&gt;.&lt;/p&gt;


    &lt;p&gt;&lt;a href=&quot;https://rjuju.github.io/postgresql/2015/03/18/talking-about-opm-and-powa-at-pgconf-ru.html&quot;&gt;Talking About OPM and PoWA at pgconf.ru&lt;/a&gt; was originally published by Julien Rouhaud at &lt;a href=&quot;https://rjuju.github.io&quot;&gt;rjuju's home&lt;/a&gt; on March 18, 2015.&lt;/p&gt;
  </content>
</entry>

</feed>
