What is a bitmap scan in PostgreSQL?

gelovolro
7 min readAug 2, 2024

--

Bitmap scan is one of the methods for data retrieval in PostgreSQL, consisting of two main types of nodes, that can be seen when dumping the query plan using EXPLAIN ANALYZE VERBOSE:

  1. Bitmap Index Scan: this is the primary index scan, that creates a bitmap array. This operation does not get the actual row values, but creates a bitmap array with the heap-page locations of the rows. There can be multiple such operations.
  2. Bitmap Heap Scan: this is the final operation in the plan tree, that uses the previously created bitmap arrays from the earlier stage to fetch the actual row values.

It’s important to note the following details:

  • there can be multiple “Bitmap Index Scan” operations in plan tree nodes, depending on the amount of data in the table and used indexes during the SQL query execution.
  • when several “Bitmap Index Scan” are applied, BitmapAnd or BitmapOr logical operations are also used against the created bitmap arrays. Their purpose is to help creating the final bitmap array (result array), which is then passed to the “Bitmap Heap Scan” node.
illustration of the query plan output: when scanning using two indexes, two bitmaps are created, then “BitmapOr” is applied for creating the final bitmap array, which is finally used in the “Bitmap Heap Scan” node

Under what conditions the bitmap scanning method is applied?

There are other scanning methods, such as:

  • sequential scan.
  • index scan or index-only scan.
  • others scan methods like CTE or TID… But for the much more relevant comparison with the bitmap scan mehod, let’s consider the first two options.

Bitmap scanning can be considered as a “middle ground” between the sequential scan and index scan methods:

  • like a sequential scanning, it leverages the advantages of reading the large amount of data.
  • but also like an index scanning, it uses the indexes to determine what data is needed to be fetched.

Let’s prepare the SQL DDL code to fill a table with the 1 million rows and apply next SQL query, that is going to use the bitmap scan method:

-- let's create a test table
CREATE TABLE public.orders (
id SERIAL NOT NULL,
customer_id INT NOT NULL,
order_date DATE NOT NULL,
price NUMERIC NOT NULL,
CONSTRAINT pk_id PRIMARY KEY(id)
);

-- now, let's create the indexes
CREATE INDEX idx_orders_customer_id ON public.orders(customer_id);
CREATE INDEX idx_orders_order_date ON public.orders(order_date);
CREATE INDEX idx_orders_price ON public.orders(price);

-- let's fill the 1 million records into the this test table
INSERT INTO public.orders(
customer_id,
order_date,
price)
SELECT
floor(random() * gs),
current_date - INTERVAL '1 day' * floor(random() * 365),
round((random() * 1000)::NUMERIC, 2)
FROM
generate_series(1, 1000000) gs;

-- now, let's check that some records exist in table
SELECT * FROM public.orders LIMIT 10 OFFSET 0;
SELECT COUNT(1) FROM public.orders;

-- and let's execute the final SQL query, which is going to produce
-- the bitmap scan behavior and let's dump the query planner output
EXPLAIN ANALYZE VERBOSE
SELECT
*
FROM
public.orders
WHERE
(customer_id <= 10 OR customer_id >= 100)
AND
order_date = '2024-01-01'
AND
price >= 100 OR price <= 10;

After executing this SQL code, you will see the query plan output with the “Bitmap Index Scan” and “Bitmap Heap Scan” nodes, don’t forget to use the EXPLAIN ANALYZE VERBOSE... part for it.

Let’s do some trick. Edit the SQL-code a bit to see next:

  1. changes in the planner (optimizer) output, it will choose another scanning method.
  2. the efficiency of bitmap scanning relative to sequential scan method, which we will see in a new query planner output.

What do you need for this exactly?

Wrap the SQL query with the EXPLAIN ANALYZE part in an explicit transaction block and drop the index with a subsequent ROLLBACK action for discarding a DROP INDEX operation:

START TRANSACTION;
DROP INDEX idx_orders_order_date;

EXPLAIN ANALYZE VERBOSE
SELECT
*
FROM
public.orders
WHERE
(customer_id <= 10 OR customer_id >= 100)
AND
order_date = '2024-01-01'
AND
price >= 100;

ROLLBACK;
demonstration of behavior change

From the new query plan output, we’re able to see next:

  • the bitmap scan method was replaced by a sequential scan (in this exactly case, it doesn’t matter that it’s a parallel one).
  • if you repeatedly execute the original SQL query and a new version of it with the transaction block, where the “sequential scan” mehtod has appeared, you will see that the “execution time” will be twice faster (in average) with the bitmap scan method, while the sequential scan method will be twice slower. Note, that the additional instructions like DROP INDEX or ROLLBACK do not affect the planner statistics, due that EXPLAIN ANALYZE is applied strictly to the SELECT part of the whole SQL-query and new planner statistics is only about it.

Let’s go deeper in the logic of bitmap scan method.

The bitmap scan method works as follows:

  1. The bitmap is dynamically created for each new SQL query with bitmap scanning. It cannot be reused in other SQL queries and is released from memory (RAM), when the query is completed.
  2. The bitmap represents an array of bits, the size of which corresponds to the number of table heap-pages, which were scanned.
  3. “Bitmap Index Scan” sets the bit values in the bitmap array, based on the heap-page address indicated by the index. When an index entry matching the search condition is found, the heap-page address it points to is determined. The corresponding bit in the bitmap is then set to “1” using the bitwise offset. The key point is that values are not read at this stage, only some kind of “a pointer” to the heap-page is obtained, which is containing the desired result for the SQL query.
  4. If multiple bitmap index scan operations were used, each operation creates its own bitmap array. The action from point 3 is repeated, and then BitmapAnd (logical AND) and BitmapOr (logical OR) are applied with producing the final bitmap.
  5. The final bitmap is then passed to the “Bitmap Heap Scan” node, where the desired values are read in a more “sequential manner” using the resulting bitmap with the page addresses, containing the desired row values. Despite this step is some kind of sequential, it reads only the necessary pages, skipping the unnecessary ones. This is the beauty of bitmap scan method.
  6. The mechanism for reading the pages in the “Bitmap Heap Scan” node involves moving to the start of each page, marked with a bit set to “1” and reading values from this page. Each read operation is rechecked for compliance with the SQL query condition.
  7. Finally, the desired values from the “Bitmap Heap Scan” are returned to the PostgreSQL user/client application as the result of the executed SQL query.

We may try to visualize this process a little:

<bitmap index scan №1>
{1 0 0 1 0 0 0 1} bitmap array №1

<bitmap index scan №2>
{1 0 0 0 1 0 0 1} bitmap array №2

<logical AND, i.e. 'BitmapAnd'>
{1 0 0 0 0 0 0 1} result bitmap array

From the resulting bitmap array {1 0 0 0 0 0 0 1}, where bits are set to “1”, the addresses of heap-pages are taken. These are pages №1 and №8 in our conditional example. Pages from №2 to №7 are skipped, and values satisfying the client’s SQL query are read from pages №1 and №8.

Other important details:

  • the row address, in the index entry, points to the ctid (tuple identifier), which is a combination of the heap-page number and the offset within this page. The bitmap scan ignores these offsets, as the scanning will check the entire page and set bit values in the bitmap to "1", if at least one row on the page meets the desired condition.
  • when clustering the table data (using the CLUSTER operator), bitmap scan method may become inefficient for the query planner (optimizer) and standard index scan will occur.

Is there a relationship between “work_mem” and the bitmap scan method, and can it negatively affect the scanning process?

Yes, there is a relationship. Low “work_mem” value can produce the loss of accuracy and the appearance of “lossy heap blocks”, because the bitmap arrays created will not fit in the host memory (RAM), which is allocated due the “work_mem” PostgreSQL parameter value.

You can repeat the SQL query with dumping the query planner output, which I’ve showed earlier. Let’s set the minimum value for “work_mem”:

SET work_mem = '64kB';

Repeat the original query with this new “work_mem” value, and you will see the following statistics:

demonstration of query plan output with the low “work_mem” parameter value

What got worse?

  • lossy heap blocks of bitmaps appeared.
  • it was indicated, how many rows failed the recheck condition, look at “Rows Removed by Index Recheck” part.

“Bitmap Heap Scan” can use the heap blocks, that either directly point to the required rows (look at “exact heap blocks” in planner) or roughly indicate the general direction of rows (look at “lossy heap blocks” in planner). A more accurate mechanism is undoubtedly better, but requires more memory (“work_mem” value). Lack of memory, due to limited “work_mem” parameter value will not allow the use of “precise” bitmaps and PostgreSQL may start using the less efficient “lossy” bitmaps, which will require the additional checks for each desired row, during the bitmap scan.

PS: The following version of PostgreSQL was used, when I was writing this article: v16.3

--

--

No responses yet