### work_mem?

The work memory, or **work_mem** is one of the most complicated setting to
configure. It can be used for various purposes. It’s mainly used when sorting
data or creating hash tables, but it can also be used by set returning
functions using a tuplestore for instance, like the **generate_series()**
function. Moreover, each node of a query can use this amount of memory. Set
this parameter too low, and a lot of temporary files will be used, set it too
high and you may encounter errors, or even an Out Of Memory (OOM) depending on
your OS configuration.

I’ll focus here on the amount of memory needed when sorting data, to help you understand how much memory is required when PostgreSQL runs a sort operation.

### Truth is out

I sometimes hear people think that there is a correlation between the size of the temporary files generated and the amount of memory that would have been needed to perform the same sort entirely in memory. It’s unfortunately wrong, you can’t make any assumption on the value of work_mem based only on the size of a sort temporary file.

It’s because when the data to be sorted don’t fit in the allowed memory,
PostgreSQL will use different algorithms, either external sort or external
merge, which have a totally different space usage. In addition to work_mem
usage, a smaller temporary file can be used multiple times, with external merge
algorithm, for less disk usage and better performance. If you want more details
on this, the relevant source code is present in
tuplesort.c
and
logtapes.c.
As a brief introduction, the header of **tuplesort.c** says:

[…] This module handles sorting of heap tuples, index tuples, or single Datums (and could easily support other kinds of sortable objects, if necessary). It works efficiently for both small and large amounts of data. Small amounts are sorted in-memory using qsort(). Large amounts are sorted using temporary files and a standard external sort algorithm.

See Knuth, volume 3, for more than you want to know about the external sorting algorithm. We divide the input into sorted runs using replacement selection, in the form of a priority tree implemented as a heap (essentially his Algorithm 5.2.3H), then merge the runs using polyphase merge, Knuth’s Algorithm 5.4.2D. The logical “tapes” used by Algorithm D are implemented by logtape.c, which avoids space wastage by recycling disk space as soon as each block is read from its “tape”. […]

**NOTE:** It’s an extract from the 9.5 version of the //readme//. External
sorts are now using a //quicksort// quicksort algorithm rather than a
//replacement
selection//.

It can be easily verified. First, let’s create a table and add some data:

To sort all these rows, `7813kB`

is needed (more details later). Let’s see the
EXPLAIN ANALYZE with work_mem set to `7813kB`

and `7812kB`

:

So, `7813kB`

are needed, but if we lack only `1kB`

, the temporary file size
is `2432kB`

.

You can also activate the trace_sort parameter to have some more information:

With these data, 28 tapes are used.

### So, how do I know how much work_mem is needed?

First, you need to know that all the data will be allocated through PostgreSQL’s
allocator **AllocSet**. If you want to know more about it, I recommend to read
the excellent articles Tomas Vondras wrote on this topic: Introduction to
memory
contexts,
Allocation set
internals and palloc
overhead examples.

The needed information here is that the allocator adds some overhead. Each
allocated block has a fixed overhead of `16B`

, and the memory size requested
(without the 16B overhead) will be rounded up to a `2^N`

size. So if you ask
for 33B, 80B will be used: 16B of overhead and 64B, the closest 2^N multiple.
The work_mem will be used to store every row, and some more information.

For each row to sort, a fixed amount of `24B`

memory will be used. This is the
size of a **SortTuple** which is the structure sorted. This amount of memory
will be allocated in a single block, so we have only `24B`

overhead (fixed 8B
and the 16B to go to the closest 2^N multiple).

The first part of the formula is therefore:

(n being the number of tuple sorted)

Then, you have to know that PostgreSQL will preallocate this space for 1024 rows. So you’ll never see a memory consumption of 2 or 3kB.

Then, each SortTuple will then contain a
**MinimalTuple**, which is basically a tuple without the system metadata (xmin,
xmax…), or an **IndexTuple** if the tuples come from an index scan. This
structure will be allocated separately for each tuple, so there can be a pretty
big overhead. Theses structures lengths are both `6B`

, but need to be aligned.
This represents `16B`

per tuple.

These structures will also contain the entire row, the size depends on the table, and the content for variable length columns.

The second part of the formula is therefore:

We can now estimate how much memory is needed:

### Testing the formula

Let’s see on our table. It contains two fields, **id** and **val**. **id** is an
integer, so it uses `4B`

. The **val** column is variable length. First, figure
out the estimated average row size:

Just to be sure, as I didn’t do any ANALYZE on the table:

So, the average row size is approximatively `14B`

. PostgreSQL showed the same
estimation on the previous EXPLAIN plan, the reported width was 14:

**NOTE:** It’s better to rely on the pg_statistic, because it’s faster and
doesn’t consume resources. Also, if you have large fields, they’ll be toasted,
and only a pointer will be stored in work_mem, not the entire field

We add the `16B`

overhead for the **MinimalTuple** structure and get `30B`

. This will lead to an allocated space of `32B`

.

Finally, the table contains 100.000 tuples, we can now compute the memory needed :

We now find the `7813kB`

I announced earlier!

This is a very simple example. If you only sort some of the rows, the estimated size can be too high or too low if the rows you sort don’t match the average size.

Also, note that if the data length of a row exceed `8kB`

(not counting the
toasted data), the allocated size won’t be rounded up to the next 2^N multiple.

### Wait, what about NULLs?

Yes, this formula was way too simple…

The formula assume you don’t have any NULL field, so it compute the **maximum
estimated** memory needed.

A NULL field won’t consume space for data, obviously, but will add a bit in a bitmap stored in the MinimalTuple.

If at least one field of a tuple is NULL, the bitmap will be created. Its size is:

So, if a tuple has 3 integer fields, and two of them are NULL, the data size will not be `16B`

but:

You can then try to estimate a better size with the statistic NULL fractions of
each attribute, available in **pg_statistics**.

### For the lazy ones

Here’s a simple query that will do the maths for you. It assumes:

- only fields from one table is sorted
- there are no NULL
- all the rows will be sorted
- statistics are accurate

### Conclusion

Now, you know the basics to estimate the amount of memory you need to sort your data.

A minimal example was presented here for a better understanding, things start to get really complicated when you don’t only sort all the rows of a single table but the result of some joins and filters.

I hope you’ll have fun tuning work_mem on your favorite cluster. But don’t forget, work_mem is used for more than just sorting tuples!