After so much time missing this features, HypoPG implements hypothetical indexes support for PostgreSQl, available as an extension.
Thanks to this extension, we can now gather real-time statistics to detect missing indexes, and much more (if you’re interested in this extension, you should read Ronan’s article about pg_qualstats). And used with PoWA, you have an interface that allows you to find the most consuming queries, and will suggest you the missing indexes if they’re needed.
That’s really nice, but now a lot of people come with this natural question: Ok, you say that I should create this index, but will PostgreSQL eventually use it ?. That’s actually a good question, because depending on many parameters (in many other things), PostgreSQL could choose to just ignore your freshly created index. That could be a really bad surprise, especially if you had to wait many hours to have it built.
So yes, the answer to this question is hypothetical indexes support. That’s really not a new idea, a lot of popular RDBMS support them.
There has already been some previous work on this several years ago, presented at pgCon 2010, which was implementing much more than hypothetical indexes, but this was a research work, which means that we never saw those features coming up in PostgreSQL. This great work is only available as a fork of a few specific PostgreSQL versions, the most recent being 9.0.1.
lightweight implementation: HypoPG
I had quite a different approach in HypoPG to implement hypothetical indexes support.
- first of all, it must be completely pluggable. It’s available as an extension and can be used (for now) on any 9.2 or higher PostgreSQL server.
- it must be as non intrusive as it’s possible. It’s usable as soon as you create the extension, without restart. Also, each backend has its own set of hypothetical indexes, which mean that adding an hypothetical index will not disturb other connections. Also, the hypothetical indexes are stored in memory, adding/removing a huge amount of them will not bloat your system catalog.
The only restriction in implementing such a feature as an extension is that you can’t change the syntax without modifying the PostgreSQL source code. So, everything has to be done through user defined functions, and change regular behaviour of existing functionnalities, like the EXPLAIN command. We’ll study the details later.
For now, the following functions are available:
- hypopg(): return the list of hypothetical indexes (in a similar way as pg_index).
- hypopg_add_index(schema, table, attribute, access_method): create a 1-column-only hypothetical index.
- hypopg_create_index(query): create an hypothetical index using a standard CREATE INDEX statement.
- hypopg_drop_index(oid): remove the specified hypothetical index.
- hypopg_list_indexes(): return a short human readable version list of available hypothetical indexes.
- hypopg_relation_size(oid): return the estimated size of an hypothetical index
- hypopg_reset(): remove all hypothetical indexes
If some hypothetical indexes exists for some relations used in an EXPLAIN (without ANALYZE) statement, they will automatically be added to the list of real indexes. PostgreSQL will then choose to use them or not.
Installing HypoPG is quite simple. Assuming you downloaded and extracted a tarball in the hypopg-0.0.1 directory, are using a packaged version of PostgreSQL and have -dev packages:
Then HypoPG should be available:
Let’s try some really simple tests. First, create a small table:
Then, let’s see a query plan that should benefit an index that’s not here:
No surprise, a sequential scan is the only way to go. Now, let’s try to add an hypothetical index, and EXPLAIN again:
Yeah! Our hypothetical index is used. We also notice that the hypothetical index creation is more or less 1ms, which is way less than the real index creation would have last.
And of course, this hypothetical index is not used in an EXPLAIN ANALYZE:
Now let’s go further:
Our hypothetical index is still used, but an index on id and val should help this query. Also, as there’s a wildcard on the right-side of the LIKE pattern, the operator class text_pattern_ops is needed. Let’s check that:
And yes, PostgreSQL decides to use our new index!
Index size estimation
For now, the index size estimation is done quickly, which can give us a clue on what would be the real index size.
Let’s check the estimated size of our two hypothetical indexes:
Now, create the real indexes, and compare the sizes:
The estimated index size is a bit higher than the real size. It’s on purpose. If the estimated index size is less than an existing index, PostgreSQL would prefer to use the hypothetical index than the real index, which is definitively not interesting. Also, to simulate a bloated index (which is quite frequent on real indexes), a hardcoded 20% bloat factor is added. Finally, the estimation could also be improved a lot.
This 0.0.1 version of HypoPG is still a work in progress, and a lot of work is still needed.
Here are the main limitations (at least that I’m aware of):
- only btree hypothetical indexes are supported
- no hypothetical indexes on expression
- no hypothetical indexes on predicate
- tablespace specification is not possible
- index size estimation could be improved, and it’s not possible to change the bloat factor
However, I believe it can already be helpful.
What’s next ?
Now, the next step is to implement HypoPG support in PoWA, to help DBA decide wether they should create the suggested index or not, and remove the current limitations.
If you want to try HypoPG, here is the github repository: github.com/HypoPG/hypopg.