Configuration

Documentation home

pg_fusion uses PostgreSQL GUCs to configure the runtime path, the DataFusion worker, shared-memory transport, scan streaming, runtime filters, spill, and diagnostics.

The architecture explains why the runtime is shaped as one background worker plus preallocated shared memory. Glossary defines the DataFusion, Arrow, page-pool, filter, DPHyp, and CTID terms used below. This page lists the knobs that configure that shape.

Required Preload

shared_preload_libraries = 'pg_fusion'

pg_fusion must be loaded at postmaster start because it registers PostgreSQL hooks, shared memory, and a background worker.

Enable The Runtime Path

Setting	Default	Level	Description
`pg_fusion.enable`	`off`	User/session	Enables the strict pg_fusion path for supported SELECT queries in the session.
`pg_fusion.backend_log_level`	`0`	User/session	Backend diagnostics: `0=off`, `1=basic`, `2=trace`.

Use pg_fusion.enable when comparing a query against vanilla PostgreSQL:

SET pg_fusion.enable = off;
-- run PostgreSQL plan

SET pg_fusion.enable = on;
-- run pg_fusion plan, or fail with a pg_fusion planning error if unsupported

Size The Worker

The worker is the DataFusion resource box. These settings are postmaster-level.

Setting	Default	Description
`pg_fusion.worker_threads`	`0`	DataFusion Tokio runtime thread count. `0` chooses automatically from available CPU parallelism.
`pg_fusion.worker_memory_limit_mb`	`0`	DataFusion worker memory limit. `0` uses the default unbounded runtime and disables worker spill.
`pg_fusion.worker_spill_directory`	`''`	Base directory for worker-owned spill files. Empty uses OS temporary storage.
`pg_fusion.worker_log_filter`	`warn`	Worker tracing filter.
`pg_fusion.log_path`	`/tmp/pg_fusion.log`	Worker diagnostic log path.

Set pg_fusion.worker_memory_limit_mb above 0 only when you want a finite DataFusion memory pool and worker-owned spill. Spill files are not PostgreSQL temporary files.

pg_fusion.worker_threads controls the Tokio runtime threads inside the pg_fusion worker process. It does not control PostgreSQL dynamic scan workers, and it does not by itself change DataFusion physical plan partitioning. Set it to 1 to force the previous single-thread runtime shape.

Size Shared Memory

Shared-memory settings are postmaster-level because they define fixed transport layout at PostgreSQL startup.

Primary execution channels carry session lifecycle messages such as start, cancel, completion, and errors.

Setting	Default	Description
`pg_fusion.control_slot_count`	`64`	Number of primary backend/worker control slots.
`pg_fusion.control_backend_to_worker_capacity`	`8192`	Per-slot primary control ring capacity from backend to worker.
`pg_fusion.control_worker_to_backend_capacity`	`8192`	Per-slot primary control ring capacity from worker to backend.

Scan channels are separate because scan requests and responses can be frequent.

Setting	Default	Description
`pg_fusion.scan_slot_count`	`64`	Number of dedicated scan control slots.
`pg_fusion.scan_backend_to_worker_capacity`	`256`	Dedicated scan ring capacity from backend to worker.
`pg_fusion.scan_worker_to_backend_capacity`	`256`	Dedicated scan ring capacity from worker to backend.

The page pool carries Arrow scan pages to the worker and result pages back to the backend.

Setting	Default	Description
`pg_fusion.page_size`	`65536`	Shared page size in bytes.
`pg_fusion.page_count`	`256`	Number of shared pages. Also sizes the issued-page permit pool.

More pages can reduce backpressure but increase fixed shared-memory footprint. The page pool is shared by scan and result traffic, and pages return to the pool after the last owner releases them. See Memory And Pages for the block format, zero-copy imports, materialization boundaries, and the progress-not-fairness model.

Tune Scan Streaming

Scan tuning controls how PostgreSQL scan producers feed Arrow pages to the worker.

Setting	Default	Level	Description
`pg_fusion.scan_fetch_batch_rows`	`1024`	Postmaster	Rows requested per PostgreSQL portal drain in backend scan streaming.
`pg_fusion.scan_batch_channel_capacity`	`32`	User/session	Bounded worker scan batch channel capacity per PostgreSQL scan stream.
`pg_fusion.scan_idle_poll_interval_us`	`50`	User/session	Worker scan idle poll interval in microseconds.
`pg_fusion.estimator_initial_tail_bytes_per_row`	`64`	Postmaster	Initial variable-width Arrow page tail estimate.

Scan producers can be leader-only, or they can be dynamic PostgreSQL background workers scanning disjoint CTID block ranges for eligible heap scans. Each producer writes its own Arrow pages into the shared page pool. The worker fans those producer streams into one logical scan, as described in Execution Model.

If scan metrics show high backend page fill time, the bottleneck may be PostgreSQL scanning, tuple decoding, detoast, or slot-to-Arrow encoding rather than worker execution.

Configure Planning Optimizations

Setting	Default	Level	Description
`pg_fusion.join_reordering`	`on`	User/session	Enables statistics-based join reordering for eligible joins.

PostgreSQL scan planning still matters because scan leaves execute trusted PostgreSQL scan SQL through PostgreSQL executor portals.

Useful PostgreSQL planner experiment settings include:

max_parallel_workers_per_gather = 2
min_parallel_table_scan_size = '8MB'
parallel_setup_cost = 1000
parallel_tuple_cost = 0.1

These settings affect PostgreSQL-side scan planning. They do not configure DataFusion worker memory.

max_parallel_workers_per_gather is especially important for pg_fusion CTID range scans. 0 keeps scan production leader-only. A positive value gives pg_fusion a query-wide budget for dynamic PostgreSQL scan producers, still capped by pg_fusion limits and by available PostgreSQL worker capacity. It does not control DataFusion’s Tokio tasks or the DataFusion worker thread count.

Configure Runtime Filters

Runtime filters can reduce rows before slot-to-Arrow encoding on eligible hash join probe scans.

Setting	Default	Level	Description
`pg_fusion.runtime_filter_enable`	`on`	User/session	Enables runtime Bloom filters for eligible hash joins.
`pg_fusion.runtime_filter_count`	`64`	Postmaster	Number of shared-memory runtime filter slots.
`pg_fusion.runtime_filter_bits`	`1048576`	Postmaster	Bloom filter bit count per slot.
`pg_fusion.runtime_filter_hashes`	`4`	Postmaster	Number of Bloom hash probes per slot.

If the pool is exhausted, execution continues without the missing filter and records a diagnostic counter.

Worker Spill

pg_fusion.worker_memory_limit_mb = 0 keeps DataFusion on the default unbounded runtime and disables worker spill.

Setting it above 0 enables a finite DataFusion memory pool and worker-owned OS temporary spill files. pg_fusion.worker_spill_directory may point at an absolute spill root; empty uses OS temporary storage under pg_fusion/spill.

This v1 spill path is owned by the pg_fusion worker. It does not use PostgreSQL temp_tablespaces, temp_file_limit, or ResourceOwner cleanup.

See Metrics for spill diagnostics.