Quick Start
This guide builds pg_fusion and runs one local query in a pgrx PostgreSQL 17
development cluster.
Prerequisites
- Rust 1.89 or newer.
- PostgreSQL 17 development headers and
pg_config. cargo-pgrx.
For contributor setup, see Development.
Install pgrx
cargo-pgrx and the pgrx crates must use the same exact patch version.
Check the selected pgrx version:
cargo tree -p pgrx --depth 0
Then install the matching cargo-pgrx binary:
PGRX_VERSION=$(cargo tree -p pgrx --depth 0 | sed -n 's/^pgrx v//p')
cargo install cargo-pgrx --version "$PGRX_VERSION" --locked --force
cargo pgrx --version
cargo pgrx init --pg17 $(which pg_config)
Use the full path to the PostgreSQL 17 pg_config if multiple PostgreSQL
versions are installed.
Build
cargo build --release -p pg_fusion
Use a release build for local experiments. Debug builds add enough overhead to make pg_fusion and PostgreSQL comparisons misleading.
Configure PostgreSQL
pg_fusion must be preloaded because it registers hooks, shared memory, and a
background worker.
For a 16 GiB development machine, add:
shared_preload_libraries = 'pg_fusion'
pg_fusion.worker_threads = 0
pg_fusion.log_path = '/tmp/pg_fusion.log'
pg_fusion.worker_log_filter = 'warn'
pg_fusion.worker_memory_limit_mb = 2048
pg_fusion.worker_spill_directory = '/tmp/pg_fusion_spill'
pg_fusion.control_slot_count = 128
pg_fusion.control_backend_to_worker_capacity = 65536
pg_fusion.control_worker_to_backend_capacity = 65536
pg_fusion.scan_slot_count = 256
pg_fusion.scan_backend_to_worker_capacity = 4096
pg_fusion.scan_worker_to_backend_capacity = 4096
pg_fusion.page_size = 262144
pg_fusion.page_count = 1024
pg_fusion.scan_fetch_batch_rows = 4096
pg_fusion.scan_batch_channel_capacity = 128
pg_fusion.scan_idle_poll_interval_us = 50
pg_fusion.estimator_initial_tail_bytes_per_row = 64
pg_fusion.join_reordering = on
pg_fusion.runtime_filter_enable = on
pg_fusion.runtime_filter_count = 128
pg_fusion.runtime_filter_bits = 4194304
pg_fusion.runtime_filter_hashes = 4
This profile reserves about 256 MiB for the shared page pool and about 64 MiB
for runtime Bloom filters, plus control-ring overhead. It also gives the
DataFusion worker a 2 GiB memory pool and enables worker-owned spill under
/tmp/pg_fusion_spill.
For smaller or heavily loaded machines, reduce pg_fusion.page_count first.
Restart PostgreSQL after changing postmaster-level settings.
Start psql
cargo pgrx run pg17 -p pg_fusion --release
Then create the extension:
CREATE EXTENSION IF NOT EXISTS pg_fusion;
Run A Query
CREATE TABLE t AS
SELECT i AS id, i % 10 AS group_id, i::double precision AS value
FROM generate_series(1, 1000000) AS i;
ANALYZE t;
SET pg_fusion.enable = on;
SELECT count(*), avg(value)
FROM t
WHERE group_id >= 0;
Try A Larger Aggregate Query
This example returns one row after PostgreSQL scan rows have crossed into Arrow pages and DataFusion has computed the aggregate.
DROP TABLE IF EXISTS t;
CREATE TABLE t (a int PRIMARY KEY, b int);
INSERT INTO t
SELECT g, g % 1000
FROM generate_series(1, 1000000) g;
ANALYZE t;
SET pg_fusion.enable = on;
SELECT count(*)
FROM t
WHERE b >= 0;
Expected result:
count
------------
1000000
Treat timing as workload-specific; compare on your machine with
pg_fusion.enable off and on.
COPY (SELECT ...) TO STDOUT can use the same pg_fusion path when the nested
SELECT is eligible:
COPY (
SELECT count(*)
FROM t
WHERE b >= 0
) TO STDOUT WITH (FORMAT csv);
Inspect The Plan
EXPLAIN
SELECT count(*), avg(value)
FROM t
WHERE group_id >= 0;
Look for Custom Scan (PgFusionScan) and PostgreSQL scan leaves. The scan
leaves show the SQL that PostgreSQL executes before rows are encoded into Arrow
pages.
Compare With PostgreSQL
SET pg_fusion.enable = off;
EXPLAIN ANALYZE
SELECT count(*), avg(value)
FROM t
WHERE group_id >= 0;
SET pg_fusion.enable = on;
EXPLAIN ANALYZE
SELECT count(*), avg(value)
FROM t
WHERE group_id >= 0;
If pg_fusion is slower, check whether the query sends many rows or columns to the worker. Metrics shows how to inspect scan encoding, transport, worker execution, and result transfer.
Next Steps
- Read Architecture for the runtime and resource model.
- Read Query support before trying application queries.
- Read Configuration before changing shared-memory or worker limits.