pg_fusion Documentation

pg_fusion runs selected analytical PostgreSQL SELECT queries through a shared DataFusion background worker. PostgreSQL still owns heap access, snapshots, MVCC visibility, TOAST, tuple decoding, and final result slots.

DataFusion is a Rust analytical execution engine over Apache Arrow columnar batches. pg_fusion uses it for selected analytical execution above PostgreSQL scan streams; it does not replace PostgreSQL storage or MVCC.

Start with the pages that answer operational questions first.

Start Here

Topic	Use It For
Quick start	Build the extension, configure a local pgrx cluster, and run a first query
Glossary	Learn the terms: DataFusion, Arrow, slots, page pool, filters, DPHyp, CTID scans
Architecture	Understand the backend/worker/shared-memory model and why rows cross into Arrow
Memory and pages	Understand shared blocks, zero-copy imports, materialization, and page reuse
Execution model	Follow one eligible query from planning to result slots
Query support	Check which query shapes and types are currently eligible
Compatibility matrix	Inspect PostgreSQL to DataFusion type, expression, function, aggregate, and window mappings
Workloads	Evaluate good and poor workload candidates
Limitations	Understand overhead cases, semantic boundaries, and unsupported features

Operate

Topic	Use It For
Configuration	Size the worker, shared memory, scan streaming, runtime filters, and spill
Metrics	Diagnose scan encoding, worker backpressure, result transfer, filters, and spill
Benchmarks	Run local comparison benchmarks and interpret the results

Build And Contribute

Topic	Use It For
Development	Set up Rust, pgrx, and the contributor workflow
Testing	Run standalone Rust tests and PostgreSQL-backed pgrx tests
Roadmap	Follow typed planning, PG18 support, compatibility, and testing direction

Status

pg_fusion is experimental. Treat unsupported query shapes as not implemented, not as implicitly equivalent to PostgreSQL execution.