Skip to the content.

Quick Start

Documentation home

This guide builds pg_fusion and runs one local query in a pgrx PostgreSQL 17 development cluster.

Prerequisites

For contributor setup, see Development.

Install pgrx

cargo-pgrx and the pgrx crates must use the same exact patch version. Check the selected pgrx version:

cargo tree -p pgrx --depth 0

Then install the matching cargo-pgrx binary:

PGRX_VERSION=$(cargo tree -p pgrx --depth 0 | sed -n 's/^pgrx v//p')
cargo install cargo-pgrx --version "$PGRX_VERSION" --locked --force
cargo pgrx --version
cargo pgrx init --pg17 $(which pg_config)

Use the full path to the PostgreSQL 17 pg_config if multiple PostgreSQL versions are installed.

Build

cargo build --release -p pg_fusion

Use a release build for local experiments. Debug builds add enough overhead to make pg_fusion and PostgreSQL comparisons misleading.

Configure PostgreSQL

pg_fusion must be preloaded because it registers hooks, shared memory, and a background worker.

For a 16 GiB development machine, add:

shared_preload_libraries = 'pg_fusion'

pg_fusion.worker_threads = 0
pg_fusion.log_path = '/tmp/pg_fusion.log'
pg_fusion.worker_log_filter = 'warn'
pg_fusion.worker_memory_limit_mb = 2048
pg_fusion.worker_spill_directory = '/tmp/pg_fusion_spill'

pg_fusion.control_slot_count = 128
pg_fusion.control_backend_to_worker_capacity = 65536
pg_fusion.control_worker_to_backend_capacity = 65536

pg_fusion.scan_slot_count = 256
pg_fusion.scan_backend_to_worker_capacity = 4096
pg_fusion.scan_worker_to_backend_capacity = 4096

pg_fusion.page_size = 262144
pg_fusion.page_count = 1024

pg_fusion.scan_fetch_batch_rows = 4096
pg_fusion.scan_batch_channel_capacity = 128
pg_fusion.scan_idle_poll_interval_us = 50
pg_fusion.estimator_initial_tail_bytes_per_row = 64

pg_fusion.join_reordering = on

pg_fusion.runtime_filter_enable = on
pg_fusion.runtime_filter_count = 128
pg_fusion.runtime_filter_bits = 4194304
pg_fusion.runtime_filter_hashes = 4

This profile reserves about 256 MiB for the shared page pool and about 64 MiB for runtime Bloom filters, plus control-ring overhead. It also gives the DataFusion worker a 2 GiB memory pool and enables worker-owned spill under /tmp/pg_fusion_spill.

For smaller or heavily loaded machines, reduce pg_fusion.page_count first.

Restart PostgreSQL after changing postmaster-level settings.

Start psql

cargo pgrx run pg17 -p pg_fusion --release

Then create the extension:

CREATE EXTENSION IF NOT EXISTS pg_fusion;

Run A Query

CREATE TABLE t AS
SELECT i AS id, i % 10 AS group_id, i::double precision AS value
FROM generate_series(1, 1000000) AS i;

ANALYZE t;

SET pg_fusion.enable = on;

SELECT count(*), avg(value)
FROM t
WHERE group_id >= 0;

Try A Larger Aggregate Query

This example returns one row after PostgreSQL scan rows have crossed into Arrow pages and DataFusion has computed the aggregate.

DROP TABLE IF EXISTS t;
CREATE TABLE t (a int PRIMARY KEY, b int);

INSERT INTO t
SELECT g, g % 1000
FROM generate_series(1, 1000000) g;

ANALYZE t;

SET pg_fusion.enable = on;

SELECT count(*)
FROM t
WHERE b >= 0;

Expected result:

  count
------------
 1000000

Treat timing as workload-specific; compare on your machine with pg_fusion.enable off and on.

COPY (SELECT ...) TO STDOUT can use the same pg_fusion path when the nested SELECT is eligible:

COPY (
  SELECT count(*)
  FROM t
  WHERE b >= 0
) TO STDOUT WITH (FORMAT csv);

Inspect The Plan

EXPLAIN
SELECT count(*), avg(value)
FROM t
WHERE group_id >= 0;

Look for Custom Scan (PgFusionScan) and PostgreSQL scan leaves. The scan leaves show the SQL that PostgreSQL executes before rows are encoded into Arrow pages.

Compare With PostgreSQL

SET pg_fusion.enable = off;
EXPLAIN ANALYZE
SELECT count(*), avg(value)
FROM t
WHERE group_id >= 0;

SET pg_fusion.enable = on;
EXPLAIN ANALYZE
SELECT count(*), avg(value)
FROM t
WHERE group_id >= 0;

If pg_fusion is slower, check whether the query sends many rows or columns to the worker. Metrics shows how to inspect scan encoding, transport, worker execution, and result transfer.

Next Steps