Skip to the content.

Workloads

Documentation home

This page describes workload shapes that are useful to evaluate with pg_fusion, and what information to collect when a query is faster, slower, or unsupported.

The main question is whether the DataFusion worker can do enough useful work after scan ingress to justify the PostgreSQL heap-tuple to Arrow conversion cost.

Good Candidates

The best early candidates are join-heavy analytical queries that create large intermediate results inside the DataFusion worker.

Those intermediate batches are already columnar Arrow data. They do not pay the PostgreSQL heap-row decoding, Arrow encoding, or backend-to-worker transport cost again while DataFusion joins, filters, aggregates, sorts, or repartitions them inside the worker.

Other useful candidates include:

Why These Workloads Fit

pg_fusion pays boundary costs at scan ingress and result egress:

Once data is inside the worker, intermediate DataFusion batches stay in Arrow form. A query is a better candidate when it does substantial worker-local relational work and returns less data than it scanned.

Poor Candidates

pg_fusion is usually a poor fit when:

These cases are still useful to report when they expose a specific bottleneck.

What To Collect

Please include:

Use GitHub Issues or Discussions for now. Avoid sharing sensitive schema or customer data.