Benchmarks

pg_fusion includes local diagnostic benchmarks. They are for engineering evaluation, not audited TPC-H publication.

Benchmark results should be read together with query plans and metrics. A pg_fusion query can be slower if scan, tuple decoding, Arrow encoding, or page transport dominates the DataFusion-side work.

Native TPC-H Harness

The supported TPC-H harness is the Rust binary crate:

benches/tpch/runner

It embeds the Apache-2.0 tpchgen crate, streams generated rows into PostgreSQL with COPY FROM STDIN, and uses native PostgreSQL numeric(15,2) and date columns.

The runner includes SQL templates from benches/tpch/queries/. Use those same files for manual psql checks; for example, q2 includes the canonical minimum-cost supplier subquery and should not be replaced with a simplified join shape when comparing against runner timings.

Prerequisites

Use release builds for benchmark runs; debug builds are too slow for meaningful timings.

For a local pgrx-managed PostgreSQL 17 cluster:

cargo pgrx init --pg17 /path/to/pg_config
cargo pgrx install --release -p pg_fusion --pg-config /path/to/pg_config

For an external PostgreSQL 17 installation:

cargo pgrx install --release -p pg_fusion --pg-config /path/to/pg_config

Set shared_preload_libraries = 'pg_fusion' before PostgreSQL starts. For stable TPC-H runs, also size the page pool, scan slots, control rings, DataFusion worker memory, and runtime filter pool. One useful 16 GiB development profile is in Quick Start; the TPC-H README has the SF10 profile and hash-join memory notes. All GUCs and sizing tradeoffs are listed in Configuration. After changing preload or postmaster-level pg_fusion settings, restart PostgreSQL, then run cargo pgrx start pg17 -p pg_fusion and cargo pgrx run --release pg17 -p pg_fusion for a pgrx-managed cluster. Run CREATE EXTENSION IF NOT EXISTS pg_fusion; in the benchmark database. The TPC-H runner toggles pg_fusion.enable itself and sets max_parallel_workers_per_gather from --parallel-workers.

PostgreSQL Connection

Use --dbname / -d, --host, --port, and --user / -U for explicit connections. The runner also reads PGDATABASE, PGHOST, PGPORT, PGUSER, and PGPASSWORD. Passwords should use PGPASSWORD; there is no --password flag.

PGPASSWORD=secret cargo run --release -p pg_fusion_tpch -- \
  --host localhost \
  --port 5432 \
  --user postgres \
  --dbname pg_fusion \
  --scale-factor 0.01

Quick Run

From the repository root:

cargo run --release -p pg_fusion_tpch -- \
  --dbname pg_fusion \
  --scale-factor 0.01 \
  --runs 3 \
  --warmup 1

By default, the runner:

recreates the benchmark schema;
streams generated TPC-H data into PostgreSQL;
analyzes the tables;
runs each query with pg_fusion.enable = off;
runs each query with pg_fusion.enable = on;
alternates measured modes to reduce hot-cache ordering bias;
writes CSV and JSON summaries.

Reuse An Existing Schema

cargo run --release -p pg_fusion_tpch -- \
  --dbname pg_fusion \
  --no-prepare \
  --queries q01,q03,q06

For a focused q2 diagnostic, use --queries q02 --runs 1 --warmup 0. The runner still performs vanilla and Fusion EXPLAIN preflights, executes both modes through COPY ... TO STDOUT, and compares the resulting CSV bytes.

Result Statuses

ok: PostgreSQL and pg_fusion both succeeded and returned matching rows.
mismatch: both succeeded but byte-identical output differed.
fusion_fail: vanilla preflight succeeded, but pg_fusion failed or did not plan through PgFusionScan.
pg_fail: PostgreSQL failed, so the comparison is invalid.

Report Output

The console report shows median latency for measured runs. speedup is pg_median_ms / fusion_median_ms, so values above 1 mean pg_fusion was faster. The JSON artifact includes raw timings, row counts, result hashes, PostgreSQL metadata, and pg_fusion extension version when available. For runtime diagnostics, use Metrics.

Useful Query Groups

For scan and encoding experiments, start with:

q01;
q06;
q14;
q19.

For joins and grouped aggregation, inspect:

q03;
q05;
q10;
q12.

The remaining canonical-derived queries are useful for planner coverage and failure reporting.

Row Encoder Microbenchmark

To isolate PostgreSQL-free Rust page encoding work, run the Criterion benchmark:

cargo bench -p row_encoder --bench q05_encode

This is not a vanilla-vs-Fusion TPC-H run. The benchmark name reflects its q05-shaped input fixture. If PG_FUSION_TPCH_DIR points to compatible CSV files, the benchmark uses them; otherwise it falls back to a deterministic synthetic fixture.