pbfhogg CLI Reference
Version 0.2.0. Generated from pbfhogg --help output.
Global flags
All commands support -h, --help and -V, --version.
Common flags
These flags appear on most commands that produce PBF output:
| Flag | Description |
|---|---|
-o, --output <FILE> | Output file path |
--compression <COMPRESSION> | Blob compression: none, zlib (default), zstd, or with level (zlib:9, zstd:19) |
--direct-io | Use O_DIRECT to bypass page cache (requires linux-direct-io feature) |
--force | Proceed even if input lacks indexdata (slower fallback path) |
--generator <GENERATOR> | Override the writing program name in the output header |
--output-header <KEY=VALUE> | Set output header fields (repeatable). Keys: osmosis_replication_timestamp, osmosis_replication_sequence_number, osmosis_replication_base_url |
Commands
inspect
Inspect PBF file: metadata, block breakdown, ordering analysis.
On indexed PBFs, uses an index-only fast path that reads blob headers without decompression (~36ms on 473 MB vs ~4s for full decode).
pbfhogg inspect [OPTIONS] <FILE>
pbfhogg inspect tags [OPTIONS] <FILE> [EXPRESSIONS]...| Flag | Description |
|---|---|
--indexed | Check if PBF has blob-level indexdata (exit code 0 = yes, 1 = no) |
--nodes | Analyze node coordinate statistics for FOR compression sizing |
--blocks [N] | Show per-block distribution stats and optional block listing (N limits to first/last N blocks) |
--id-ranges | Show min/max element IDs per type and monotonicity |
--locations | Show locations-on-ways diagnostics |
--anomalies | Show only anomalous blocks (<50% or >150% of median, plus mixed blocks) |
-e, --extended | Extended scan: timestamp range, data bbox, metadata coverage, ordering |
-g, --get <KEY> | Get a single value by key path (e.g. header.bbox, data.timestamp.first) |
--json | Machine-readable JSON output |
--show <TYPE_ID> | Display a single element by ID (e.g. n123, w456, r789). Uses indexdata to skip non-matching blobs, early exit on sorted PBFs |
--direct-io | Use O_DIRECT to bypass page cache |
--force | Proceed even if input lacks indexdata (for --nodes) |
inspect tags
Count tag key=value frequencies (subcommand of inspect).
| Flag | Description |
|---|---|
--min-count <N> | Only show tags with at least this many occurrences [default: 1] |
-M, --max-count <N> | Only show tags with at most this many occurrences |
-s, --sort <ORDER> | Sort order: count-desc (default), count-asc, name-asc, name-desc |
-e, --expressions <FILE> | Read tag expressions from file (one per line, # comments) |
-t, --type <TYPE> | Filter by element type: node, way, or relation |
--direct-io | Use O_DIRECT to bypass page cache |
--force | Proceed even if input lacks indexdata (slower fallback path) |
check
Validate PBF file integrity (IDs + referential integrity).
With no flags, runs both ID and referential integrity checks. Use --ids or --refs to run only one.
pbfhogg check [OPTIONS] <FILE>| Flag | Description |
|---|---|
--ids | Check ID uniqueness and ordering only |
--refs | Check referential integrity only |
--full | Full duplicate detection via bitmap (slower, more memory; applies to ID check) |
-t, --type <TYPE> | Filter by element type (comma-separated: node, way, relation; applies to ID check) |
--max-errors <N> | Stop after N violations (0 = unlimited) [default: 100] |
--check-relations | Also check relation member references (applies to ref check) |
--show-ids | Show IDs of missing objects, format: n123 in w456 (applies to ref check) |
--json | Machine-readable JSON output |
--quiet | Exit-code only, no output |
--direct-io | Use O_DIRECT to bypass page cache |
For missing relation-to-relation members, reports unique missing IDs with occurrence count when they differ: Missing relation members: 706 (777 references).
cat
Concatenate PBF files with optional type filtering. Embeds blob-level indexdata and tagdata automatically.
With --dedupe, merges multiple sorted PBF files with blob-level passthrough and exact-duplicate deduplication.
pbfhogg cat [OPTIONS] --output <OUTPUT> <FILES>...| Flag | Description |
|---|---|
-o, --output <FILE> | Output file |
-t, --type <TYPE> | Filter by element type (comma-separated: node, way, relation) |
-c, --clean <ATTR> | Strip metadata attribute (repeatable: version, timestamp, changeset, uid, user) |
--dedupe | K-way sorted merge with dedup (all inputs must be sorted) |
--compression | Blob compression [default: zlib] |
--direct-io | Use O_DIRECT to bypass page cache |
--io-uring | Use io_uring for output I/O (only with --dedupe) |
--force | Proceed even if input lacks indexdata |
--generator | Override writing program name |
--output-header <K=V> | Set output header fields (repeatable) |
sort
Sort PBF into standard order (nodes, ways, relations, each by ascending ID). For already-sorted inputs with indexdata, blobs pass through as raw bytes.
pbfhogg sort [OPTIONS] --output <OUTPUT> <FILE>| Flag | Description |
|---|---|
-o, --output <FILE> | Output file |
--compression | Blob compression [default: zlib] |
--direct-io | Use O_DIRECT to bypass page cache |
--io-uring | Use io_uring for output I/O |
--force | Proceed even if input lacks indexdata |
--generator | Override writing program name |
--output-header <K=V> | Set output header fields (repeatable) |
repack
Unreleased - lands in the next pbfhogg release after 0.3.0.
Re-encode a PBF with a configurable per-blob element cap. Element semantics, tags, refs, members, metadata, and DenseNodes encoding all round-trip; output is type-sorted and propagates Sort.Type_then_ID from the input header.
Primary use case: producing same-corpus-different-encoding pairs for blob-density measurement (Geofabrik's ~8 k/blob convention vs planet.openstreetmap.org's ~228 k/blob), so commands with implicit blob-count scaling can be measured at controlled densities.
pbfhogg repack [OPTIONS] --output <OUTPUT> <FILE>| Flag | Description |
|---|---|
-o, --output <FILE> | Output file |
--elements-per-blob <N> | Per-blob element cap [default: 8000]. 8000 matches the osmium / Geofabrik convention; pass a larger value to approximate planet.openstreetmap.org-style packing. Must be > 0. |
--compression | Blob compression [default: zlib] |
--direct-io | Use O_DIRECT to bypass page cache |
--io-uring | Use io_uring for output I/O |
--force | Proceed even if input lacks indexdata |
--generator | Override writing program name |
--output-header <K=V> | Set output header fields (repeatable) |
v1 limitation: the cap fires per worker invocation, so output blobs cannot grow beyond the input blob size. Shrinking (e.g. planet 228 k -> 8 k) produces multiple output blobs per input blob and the cap fires correctly. Growing (e.g. europe 8 k -> 64 k) emits at-most-input-sized output blobs; cross-input-blob coalescing is deferred to v2.
degrade
Unreleased - lands in the next pbfhogg release after 0.3.0.
Produce a valid-but-adversarial PBF by stripping properties or perturbing structure. Each flag composes; at least one is required. Used to produce inputs for benchmarking non-optimal code paths (sort overlap-rewrite, add-locations-to-ways, --force fallbacks).
pbfhogg degrade [OPTIONS] --output <OUTPUT> <FILE>| Flag | Description |
|---|---|
-o, --output <FILE> | Output file |
--unsort | Clear Sort.Type_then_ID; perturb the element stream so at least one adjacent same-kind blob pair has overlapping IDs. Triggers sort's overlap-rewrite path. |
--strip-locations | Drop the LocationsOnWays header feature. Inline way-node coordinates are not preserved. |
--strip-indexdata | Clear BlobHeader.indexdata on every OsmData blob. Forces commands into their --force / non-indexed fallback paths (sort, getid, tags-filter). Blob payloads are not decompressed. |
--compression | Blob compression [default: zlib] |
--direct-io | Use O_DIRECT to bypass page cache |
--io-uring | Use io_uring for output I/O |
--generator | Override writing program name |
--output-header <K=V> | Set output header fields (repeatable) |
--strip-indexdata alone runs as a blob-level passthrough (raw frames reframed with cleared indexdata, payload bit-identical). Any combination involving --unsort or --strip-locations decodes elements and re-encodes via BlockBuilder.
v1 scope: --unsort, --strip-locations, --strip-indexdata. Other transformations from the design doc (--strip-tagdata, --strip-bbox, --recompress, --drop-ids) are deferred.
renumber
Renumber all element IDs sequentially, remapping cross-references (way node refs, relation member refs).
pbfhogg renumber [OPTIONS] --output <OUTPUT> <FILE>| Flag | Description |
|---|---|
-o, --output <FILE> | Output file |
-s, --start-id <ID> | Starting ID(s): single value or comma-separated node,way,relation [default: 1] |
--compression | Blob compression [default: zlib] |
--direct-io | Use O_DIRECT to bypass page cache |
--generator | Override writing program name |
--output-header <K=V> | Set output header fields (repeatable) |
extract
Extract elements within a geographic region (bounding box or polygon).
Three strategies: --simple (single pass, fast, may have dangling refs), complete-ways (default, two passes, all way nodes included), --smart (three passes, completes multipolygon/boundary relations).
Supports multi-extract via --config with a JSON config file specifying multiple regions.
pbfhogg extract [OPTIONS] <FILE>| Flag | Description |
|---|---|
-o, --output <FILE> | Output file (required for single extract, omit with --config) |
-b, --bbox <BBOX> | Bounding box: minlon,minlat,maxlon,maxlat |
-p, --polygon <FILE> | Polygon GeoJSON file |
-c, --config <FILE> | Multi-extract JSON config file |
-d, --directory <DIR> | Output directory override (only with --config) |
-s, --simple | Simple strategy (single pass) |
--smart | Smart strategy (three passes, complete relations) |
--set-bounds | Write the extract region bounding box to the output header |
--clean <ATTR> | Strip metadata attribute (repeatable: version, timestamp, changeset, uid, user) |
--compression | Blob compression [default: zlib] |
--direct-io | Use O_DIRECT to bypass page cache |
--force | Proceed even if input lacks indexdata |
--generator | Override writing program name |
--output-header <K=V> | Set output header fields (repeatable) |
tags-filter
Filter elements by tag expressions. Default mode resolves relation members transitively (matched relations pull in member ways, nodes, and nested relations). With -R, only directly matched elements are emitted.
With --input-kind osc (or autodetected from .osc/.osc.gz extension), filters an OSC change file instead, always preserving deletes. PBF-only flags (-R, -i, -t) are not valid in OSC mode.
Expressions use osmium syntax: highway=primary, amenity, w/building=yes, etc.
pbfhogg tags-filter [OPTIONS] --output <OUTPUT> <FILE> [EXPRESSIONS]...| Flag | Description |
|---|---|
-o, --output <FILE> | Output file |
--input-kind <KIND> | Input kind override: pbf or osc (autodetect from extension by default) |
-R, --omit-referenced | Omit referenced objects (faster, single pass, direct matches only; PBF only) |
-i, --invert-match | Invert match: exclude matching objects, keep non-matching (PBF only) |
-t, --remove-tags | Remove tags from referenced objects not directly matched (use without -R; PBF only) |
-e, --expressions <FILE> | Read filter expressions from file (one per line, # comments) |
--compression | Blob compression [default: zlib] |
--direct-io | Use O_DIRECT to bypass page cache |
--force | Proceed even if input lacks indexdata |
--generator | Override writing program name |
--output-header <K=V> | Set output header fields (repeatable) |
diff
Compare two PBF files and show differences. Uses content equality (coordinates, tags, refs, members) rather than version/timestamp ordering - deterministic regardless of metadata completeness (see DEVIATIONS).
With --format osc, generates an OSC diff file instead of text output. Text-only flags (-c, -v, -s, -q, -t) are not valid with --format osc. OSC-only flags (--increment-version, --update-timestamp) are not valid with --format text.
pbfhogg diff [OPTIONS] <OLD> <NEW>| Flag | Description |
|---|---|
--format <FORMAT> | Output format: text (default) or osc |
-c, --suppress-common | Hide unchanged elements (text only) |
-v, --verbose | Show detailed changes for modified elements (text only) |
-s, --summary | Show summary on stderr (text only) |
-q, --quiet | Exit-code only, suppress output (text only) |
-o, --output <FILE> | Write output to file (required for --format osc) |
-t, --type <TYPE> | Filter by element type (text only) |
--increment-version | Bump version of deleted elements by 1 (osc only) |
--update-timestamp | Set delete timestamp to current time (osc only) |
--ignore-changeset | Compatibility flag (already ignored by content-equality mode) |
--ignore-uid | Compatibility flag (already ignored by content-equality mode) |
--ignore-user | Compatibility flag (already ignored by content-equality mode) |
--direct-io | Use O_DIRECT to bypass page cache |
With --format osc, produces a lossless roundtrip - applying the derived OSC to the old PBF reproduces the new PBF exactly (see DEVIATIONS).
getid
Extract or remove elements by ID. By default, keeps only the listed IDs. With --invert, removes the listed IDs and keeps everything else.
IDs use type prefixes: n123 (node), w456 (way), r789 (relation).
pbfhogg getid [OPTIONS] --output <OUTPUT> <FILE> [IDS]...| Flag | Description |
|---|---|
-o, --output <FILE> | Output file |
--invert | Invert selection: remove listed IDs instead of keeping them |
-r, --add-referenced | Include referenced nodes of matching ways (two-pass; not with --invert) |
-t, --remove-tags | Remove tags from referenced objects (use with -r; not with --invert) |
--verbose-ids | Print requested IDs and report which were not found (not with --invert) |
-i, --id-file <FILE> | Read IDs from text file (one per line) |
-I, --id-osm-file <FILE> | Read IDs from an OSM/PBF file (all element IDs are collected) |
--default-type <TYPE> | Default type for bare numeric IDs: node, way, relation |
--compression | Blob compression [default: zlib] |
--direct-io | Use O_DIRECT to bypass page cache |
--force | Proceed even if input lacks indexdata |
--generator | Override writing program name |
--output-header <K=V> | Set output header fields (repeatable) |
getparents
Find ways/relations referencing given IDs (reverse lookup).
pbfhogg getparents [OPTIONS] --output <OUTPUT> <FILE> [IDS]...| Flag | Description |
|---|---|
-o, --output <FILE> | Output file |
-s, --add-self | Also include the queried objects themselves in the output |
-i, --id-file <FILE> | Read IDs from text file (one per line) |
-I, --id-osm-file <FILE> | Read IDs from an OSM/PBF file (all element IDs are collected) |
--default-type <TYPE> | Default type for bare numeric IDs: node, way, relation |
--compression | Blob compression [default: zlib] |
--direct-io | Use O_DIRECT to bypass page cache |
--generator | Override writing program name |
--output-header <K=V> | Set output header fields (repeatable) |
add-locations-to-ways
Embed node coordinates in ways. Three index strategies:
- dense (default) - Direct-mapped mmap array. Fastest when the working set fits in RAM. At planet scale (~16 GB touched), requires ~30+ GB free memory to avoid page cache thrashing.
- sparse - Planetiler-inspired chunk-indexed sparse array. Bounded memory (~540 MB). Slower than dense at all scales. No temp disk needed. Works on any PBF.
- external - Double radix permutation via 4-stage pipeline. Bounded memory (~17 GB at planet). 3.9x faster than dense at planet scale. Requires sorted PBF (Sort.Type_then_ID) and indexdata. Uses ~112 GB temp disk at Europe, ~300 GB at planet.
By default, untagged nodes not referenced by a relation are dropped from output.
pbfhogg add-locations-to-ways [OPTIONS] --output <OUTPUT> <FILE>| Flag | Description |
|---|---|
-o, --output <FILE> | Output file |
--index-type <TYPE> | Node index type: dense (default), sparse, external, or auto (external if sorted+indexed, dense otherwise) |
--keep-untagged-nodes | Keep all untagged nodes in output |
--compression | Blob compression [default: zlib] |
--direct-io | Use O_DIRECT to bypass page cache |
--force | Proceed even if input lacks indexdata |
--generator | Override writing program name |
--output-header <K=V> | Set output header fields (repeatable) |
time-filter
Filter a history PBF to a snapshot at a given timestamp.
pbfhogg time-filter [OPTIONS] --output <OUTPUT> <FILE> <TIMESTAMP>The timestamp can be UNIX seconds or RFC3339 UTC (YYYY-MM-DDTHH:MM:SSZ).
| Flag | Description |
|---|---|
-o, --output <FILE> | Output file |
--compression | Blob compression [default: zlib] |
--direct-io | Use O_DIRECT to bypass page cache |
--generator | Override writing program name |
--output-header <K=V> | Set output header fields (repeatable) |
apply-changes
Apply an OSC diff to a sorted PBF file. Uses blob passthrough -- unmodified blobs are copied as raw bytes without decompression.
pbfhogg apply-changes [OPTIONS] --output <OUTPUT> <BASE> <CHANGES>| Flag | Description |
|---|---|
-o, --output <FILE> | Output file |
--locations-on-ways | Preserve and update way-node locations through the merge (requires base PBF with LocationsOnWays) |
--compression | Blob compression [default: zlib] |
--direct-io | Use O_DIRECT to bypass page cache |
--io-uring | Use io_uring for output I/O |
--force | Proceed even if input lacks indexdata |
--generator | Override writing program name |
--output-header <K=V> | Set output header fields (repeatable) |
merge-changes
Merge multiple OSC files into one OSC file.
pbfhogg merge-changes [OPTIONS] --output <OUTPUT> <CHANGES>...| Flag | Description |
|---|---|
-o, --output <FILE> | Output file |
--simplify | Keep only the last change per object (type + id) |
build-geocode-index
Build a reverse geocoding index from a PBF file. Produces a set of binary files (S2 cell index, address points, street segments, admin boundaries, string pool) that can be memory-mapped for sub-millisecond reverse geocoding queries.
Requires an indexed PBF (generated by pbfhogg cat). The output directory must not already exist unless --force is set.
pbfhogg build-geocode-index [OPTIONS] --output-dir <DIR> <FILE>| Flag | Description |
|---|---|
--output-dir <DIR> | Output directory for index files |
--street-level <N> | S2 cell level for streets/addresses [default: 17] |
--coarse-level <N> | Fallback cell level for rural areas [default: 14] |
--admin-level <N> | S2 cell level for admin boundaries [default: 10] |
--max-admin-vertices <N> | Douglas-Peucker vertex cap per admin polygon [default: 500] |
--search-radius <M> | Fine-level max search distance in meters [default: 75] |
--coarse-search-radius <M> | Coarse-level max search distance in meters [default: 1000] |
--force | Proceed without indexdata / overwrite existing index |
Outputs 19 binary files. Denmark (465 MB PBF): ~7s, 172 MB index. Europe (32.4 GB): 524s (8.7 min), 7.5 GB RSS. Planet (87 GB): 1,255s (20.9 min), 29.5 GB peak RSS (pass-1.5 transient).