Skip to content

pbfhogg CLI Reference

Version 0.2.0. Generated from pbfhogg --help output.

Global flags

All commands support -h, --help and -V, --version.

Common flags

These flags appear on most commands that produce PBF output:

FlagDescription
-o, --output <FILE>Output file path
--compression <COMPRESSION>Blob compression: none, zlib (default), zstd, or with level (zlib:9, zstd:19)
--direct-ioUse O_DIRECT to bypass page cache (requires linux-direct-io feature)
--forceProceed even if input lacks indexdata (slower fallback path)
--generator <GENERATOR>Override the writing program name in the output header
--output-header <KEY=VALUE>Set output header fields (repeatable). Keys: osmosis_replication_timestamp, osmosis_replication_sequence_number, osmosis_replication_base_url

Commands

inspect

Inspect PBF file: metadata, block breakdown, ordering analysis.

On indexed PBFs, uses an index-only fast path that reads blob headers without decompression (~36ms on 473 MB vs ~4s for full decode).

pbfhogg inspect [OPTIONS] <FILE>
pbfhogg inspect tags [OPTIONS] <FILE> [EXPRESSIONS]...
FlagDescription
--indexedCheck if PBF has blob-level indexdata (exit code 0 = yes, 1 = no)
--nodesAnalyze node coordinate statistics for FOR compression sizing
--blocks [N]Show per-block distribution stats and optional block listing (N limits to first/last N blocks)
--id-rangesShow min/max element IDs per type and monotonicity
--locationsShow locations-on-ways diagnostics
--anomaliesShow only anomalous blocks (<50% or >150% of median, plus mixed blocks)
-e, --extendedExtended scan: timestamp range, data bbox, metadata coverage, ordering
-g, --get <KEY>Get a single value by key path (e.g. header.bbox, data.timestamp.first)
--jsonMachine-readable JSON output
--show <TYPE_ID>Display a single element by ID (e.g. n123, w456, r789). Uses indexdata to skip non-matching blobs, early exit on sorted PBFs
--direct-ioUse O_DIRECT to bypass page cache
--forceProceed even if input lacks indexdata (for --nodes)

inspect tags

Count tag key=value frequencies (subcommand of inspect).

FlagDescription
--min-count <N>Only show tags with at least this many occurrences [default: 1]
-M, --max-count <N>Only show tags with at most this many occurrences
-s, --sort <ORDER>Sort order: count-desc (default), count-asc, name-asc, name-desc
-e, --expressions <FILE>Read tag expressions from file (one per line, # comments)
-t, --type <TYPE>Filter by element type: node, way, or relation
--direct-ioUse O_DIRECT to bypass page cache
--forceProceed even if input lacks indexdata (slower fallback path)

check

Validate PBF file integrity (IDs + referential integrity).

With no flags, runs both ID and referential integrity checks. Use --ids or --refs to run only one.

pbfhogg check [OPTIONS] <FILE>
FlagDescription
--idsCheck ID uniqueness and ordering only
--refsCheck referential integrity only
--fullFull duplicate detection via bitmap (slower, more memory; applies to ID check)
-t, --type <TYPE>Filter by element type (comma-separated: node, way, relation; applies to ID check)
--max-errors <N>Stop after N violations (0 = unlimited) [default: 100]
--check-relationsAlso check relation member references (applies to ref check)
--show-idsShow IDs of missing objects, format: n123 in w456 (applies to ref check)
--jsonMachine-readable JSON output
--quietExit-code only, no output
--direct-ioUse O_DIRECT to bypass page cache

For missing relation-to-relation members, reports unique missing IDs with occurrence count when they differ: Missing relation members: 706 (777 references).


cat

Concatenate PBF files with optional type filtering. Embeds blob-level indexdata and tagdata automatically.

With --dedupe, merges multiple sorted PBF files with blob-level passthrough and exact-duplicate deduplication.

pbfhogg cat [OPTIONS] --output <OUTPUT> <FILES>...
FlagDescription
-o, --output <FILE>Output file
-t, --type <TYPE>Filter by element type (comma-separated: node, way, relation)
-c, --clean <ATTR>Strip metadata attribute (repeatable: version, timestamp, changeset, uid, user)
--dedupeK-way sorted merge with dedup (all inputs must be sorted)
--compressionBlob compression [default: zlib]
--direct-ioUse O_DIRECT to bypass page cache
--io-uringUse io_uring for output I/O (only with --dedupe)
--forceProceed even if input lacks indexdata
--generatorOverride writing program name
--output-header <K=V>Set output header fields (repeatable)

sort

Sort PBF into standard order (nodes, ways, relations, each by ascending ID). For already-sorted inputs with indexdata, blobs pass through as raw bytes.

pbfhogg sort [OPTIONS] --output <OUTPUT> <FILE>
FlagDescription
-o, --output <FILE>Output file
--compressionBlob compression [default: zlib]
--direct-ioUse O_DIRECT to bypass page cache
--io-uringUse io_uring for output I/O
--forceProceed even if input lacks indexdata
--generatorOverride writing program name
--output-header <K=V>Set output header fields (repeatable)

repack

Unreleased - lands in the next pbfhogg release after 0.3.0.

Re-encode a PBF with a configurable per-blob element cap. Element semantics, tags, refs, members, metadata, and DenseNodes encoding all round-trip; output is type-sorted and propagates Sort.Type_then_ID from the input header.

Primary use case: producing same-corpus-different-encoding pairs for blob-density measurement (Geofabrik's ~8 k/blob convention vs planet.openstreetmap.org's ~228 k/blob), so commands with implicit blob-count scaling can be measured at controlled densities.

pbfhogg repack [OPTIONS] --output <OUTPUT> <FILE>
FlagDescription
-o, --output <FILE>Output file
--elements-per-blob <N>Per-blob element cap [default: 8000]. 8000 matches the osmium / Geofabrik convention; pass a larger value to approximate planet.openstreetmap.org-style packing. Must be > 0.
--compressionBlob compression [default: zlib]
--direct-ioUse O_DIRECT to bypass page cache
--io-uringUse io_uring for output I/O
--forceProceed even if input lacks indexdata
--generatorOverride writing program name
--output-header <K=V>Set output header fields (repeatable)

v1 limitation: the cap fires per worker invocation, so output blobs cannot grow beyond the input blob size. Shrinking (e.g. planet 228 k -> 8 k) produces multiple output blobs per input blob and the cap fires correctly. Growing (e.g. europe 8 k -> 64 k) emits at-most-input-sized output blobs; cross-input-blob coalescing is deferred to v2.

degrade

Unreleased - lands in the next pbfhogg release after 0.3.0.

Produce a valid-but-adversarial PBF by stripping properties or perturbing structure. Each flag composes; at least one is required. Used to produce inputs for benchmarking non-optimal code paths (sort overlap-rewrite, add-locations-to-ways, --force fallbacks).

pbfhogg degrade [OPTIONS] --output <OUTPUT> <FILE>
FlagDescription
-o, --output <FILE>Output file
--unsortClear Sort.Type_then_ID; perturb the element stream so at least one adjacent same-kind blob pair has overlapping IDs. Triggers sort's overlap-rewrite path.
--strip-locationsDrop the LocationsOnWays header feature. Inline way-node coordinates are not preserved.
--strip-indexdataClear BlobHeader.indexdata on every OsmData blob. Forces commands into their --force / non-indexed fallback paths (sort, getid, tags-filter). Blob payloads are not decompressed.
--compressionBlob compression [default: zlib]
--direct-ioUse O_DIRECT to bypass page cache
--io-uringUse io_uring for output I/O
--generatorOverride writing program name
--output-header <K=V>Set output header fields (repeatable)

--strip-indexdata alone runs as a blob-level passthrough (raw frames reframed with cleared indexdata, payload bit-identical). Any combination involving --unsort or --strip-locations decodes elements and re-encodes via BlockBuilder.

v1 scope: --unsort, --strip-locations, --strip-indexdata. Other transformations from the design doc (--strip-tagdata, --strip-bbox, --recompress, --drop-ids) are deferred.

renumber

Renumber all element IDs sequentially, remapping cross-references (way node refs, relation member refs).

pbfhogg renumber [OPTIONS] --output <OUTPUT> <FILE>
FlagDescription
-o, --output <FILE>Output file
-s, --start-id <ID>Starting ID(s): single value or comma-separated node,way,relation [default: 1]
--compressionBlob compression [default: zlib]
--direct-ioUse O_DIRECT to bypass page cache
--generatorOverride writing program name
--output-header <K=V>Set output header fields (repeatable)

extract

Extract elements within a geographic region (bounding box or polygon).

Three strategies: --simple (single pass, fast, may have dangling refs), complete-ways (default, two passes, all way nodes included), --smart (three passes, completes multipolygon/boundary relations).

Supports multi-extract via --config with a JSON config file specifying multiple regions.

pbfhogg extract [OPTIONS] <FILE>
FlagDescription
-o, --output <FILE>Output file (required for single extract, omit with --config)
-b, --bbox <BBOX>Bounding box: minlon,minlat,maxlon,maxlat
-p, --polygon <FILE>Polygon GeoJSON file
-c, --config <FILE>Multi-extract JSON config file
-d, --directory <DIR>Output directory override (only with --config)
-s, --simpleSimple strategy (single pass)
--smartSmart strategy (three passes, complete relations)
--set-boundsWrite the extract region bounding box to the output header
--clean <ATTR>Strip metadata attribute (repeatable: version, timestamp, changeset, uid, user)
--compressionBlob compression [default: zlib]
--direct-ioUse O_DIRECT to bypass page cache
--forceProceed even if input lacks indexdata
--generatorOverride writing program name
--output-header <K=V>Set output header fields (repeatable)

tags-filter

Filter elements by tag expressions. Default mode resolves relation members transitively (matched relations pull in member ways, nodes, and nested relations). With -R, only directly matched elements are emitted.

With --input-kind osc (or autodetected from .osc/.osc.gz extension), filters an OSC change file instead, always preserving deletes. PBF-only flags (-R, -i, -t) are not valid in OSC mode.

Expressions use osmium syntax: highway=primary, amenity, w/building=yes, etc.

pbfhogg tags-filter [OPTIONS] --output <OUTPUT> <FILE> [EXPRESSIONS]...
FlagDescription
-o, --output <FILE>Output file
--input-kind <KIND>Input kind override: pbf or osc (autodetect from extension by default)
-R, --omit-referencedOmit referenced objects (faster, single pass, direct matches only; PBF only)
-i, --invert-matchInvert match: exclude matching objects, keep non-matching (PBF only)
-t, --remove-tagsRemove tags from referenced objects not directly matched (use without -R; PBF only)
-e, --expressions <FILE>Read filter expressions from file (one per line, # comments)
--compressionBlob compression [default: zlib]
--direct-ioUse O_DIRECT to bypass page cache
--forceProceed even if input lacks indexdata
--generatorOverride writing program name
--output-header <K=V>Set output header fields (repeatable)

diff

Compare two PBF files and show differences. Uses content equality (coordinates, tags, refs, members) rather than version/timestamp ordering - deterministic regardless of metadata completeness (see DEVIATIONS).

With --format osc, generates an OSC diff file instead of text output. Text-only flags (-c, -v, -s, -q, -t) are not valid with --format osc. OSC-only flags (--increment-version, --update-timestamp) are not valid with --format text.

pbfhogg diff [OPTIONS] <OLD> <NEW>
FlagDescription
--format <FORMAT>Output format: text (default) or osc
-c, --suppress-commonHide unchanged elements (text only)
-v, --verboseShow detailed changes for modified elements (text only)
-s, --summaryShow summary on stderr (text only)
-q, --quietExit-code only, suppress output (text only)
-o, --output <FILE>Write output to file (required for --format osc)
-t, --type <TYPE>Filter by element type (text only)
--increment-versionBump version of deleted elements by 1 (osc only)
--update-timestampSet delete timestamp to current time (osc only)
--ignore-changesetCompatibility flag (already ignored by content-equality mode)
--ignore-uidCompatibility flag (already ignored by content-equality mode)
--ignore-userCompatibility flag (already ignored by content-equality mode)
--direct-ioUse O_DIRECT to bypass page cache

With --format osc, produces a lossless roundtrip - applying the derived OSC to the old PBF reproduces the new PBF exactly (see DEVIATIONS).

getid

Extract or remove elements by ID. By default, keeps only the listed IDs. With --invert, removes the listed IDs and keeps everything else.

IDs use type prefixes: n123 (node), w456 (way), r789 (relation).

pbfhogg getid [OPTIONS] --output <OUTPUT> <FILE> [IDS]...
FlagDescription
-o, --output <FILE>Output file
--invertInvert selection: remove listed IDs instead of keeping them
-r, --add-referencedInclude referenced nodes of matching ways (two-pass; not with --invert)
-t, --remove-tagsRemove tags from referenced objects (use with -r; not with --invert)
--verbose-idsPrint requested IDs and report which were not found (not with --invert)
-i, --id-file <FILE>Read IDs from text file (one per line)
-I, --id-osm-file <FILE>Read IDs from an OSM/PBF file (all element IDs are collected)
--default-type <TYPE>Default type for bare numeric IDs: node, way, relation
--compressionBlob compression [default: zlib]
--direct-ioUse O_DIRECT to bypass page cache
--forceProceed even if input lacks indexdata
--generatorOverride writing program name
--output-header <K=V>Set output header fields (repeatable)

getparents

Find ways/relations referencing given IDs (reverse lookup).

pbfhogg getparents [OPTIONS] --output <OUTPUT> <FILE> [IDS]...
FlagDescription
-o, --output <FILE>Output file
-s, --add-selfAlso include the queried objects themselves in the output
-i, --id-file <FILE>Read IDs from text file (one per line)
-I, --id-osm-file <FILE>Read IDs from an OSM/PBF file (all element IDs are collected)
--default-type <TYPE>Default type for bare numeric IDs: node, way, relation
--compressionBlob compression [default: zlib]
--direct-ioUse O_DIRECT to bypass page cache
--generatorOverride writing program name
--output-header <K=V>Set output header fields (repeatable)

add-locations-to-ways

Embed node coordinates in ways. Three index strategies:

  • dense (default) - Direct-mapped mmap array. Fastest when the working set fits in RAM. At planet scale (~16 GB touched), requires ~30+ GB free memory to avoid page cache thrashing.
  • sparse - Planetiler-inspired chunk-indexed sparse array. Bounded memory (~540 MB). Slower than dense at all scales. No temp disk needed. Works on any PBF.
  • external - Double radix permutation via 4-stage pipeline. Bounded memory (~17 GB at planet). 3.9x faster than dense at planet scale. Requires sorted PBF (Sort.Type_then_ID) and indexdata. Uses ~112 GB temp disk at Europe, ~300 GB at planet.

By default, untagged nodes not referenced by a relation are dropped from output.

pbfhogg add-locations-to-ways [OPTIONS] --output <OUTPUT> <FILE>
FlagDescription
-o, --output <FILE>Output file
--index-type <TYPE>Node index type: dense (default), sparse, external, or auto (external if sorted+indexed, dense otherwise)
--keep-untagged-nodesKeep all untagged nodes in output
--compressionBlob compression [default: zlib]
--direct-ioUse O_DIRECT to bypass page cache
--forceProceed even if input lacks indexdata
--generatorOverride writing program name
--output-header <K=V>Set output header fields (repeatable)

time-filter

Filter a history PBF to a snapshot at a given timestamp.

pbfhogg time-filter [OPTIONS] --output <OUTPUT> <FILE> <TIMESTAMP>

The timestamp can be UNIX seconds or RFC3339 UTC (YYYY-MM-DDTHH:MM:SSZ).

FlagDescription
-o, --output <FILE>Output file
--compressionBlob compression [default: zlib]
--direct-ioUse O_DIRECT to bypass page cache
--generatorOverride writing program name
--output-header <K=V>Set output header fields (repeatable)

apply-changes

Apply an OSC diff to a sorted PBF file. Uses blob passthrough -- unmodified blobs are copied as raw bytes without decompression.

pbfhogg apply-changes [OPTIONS] --output <OUTPUT> <BASE> <CHANGES>
FlagDescription
-o, --output <FILE>Output file
--locations-on-waysPreserve and update way-node locations through the merge (requires base PBF with LocationsOnWays)
--compressionBlob compression [default: zlib]
--direct-ioUse O_DIRECT to bypass page cache
--io-uringUse io_uring for output I/O
--forceProceed even if input lacks indexdata
--generatorOverride writing program name
--output-header <K=V>Set output header fields (repeatable)

merge-changes

Merge multiple OSC files into one OSC file.

pbfhogg merge-changes [OPTIONS] --output <OUTPUT> <CHANGES>...
FlagDescription
-o, --output <FILE>Output file
--simplifyKeep only the last change per object (type + id)

build-geocode-index

Build a reverse geocoding index from a PBF file. Produces a set of binary files (S2 cell index, address points, street segments, admin boundaries, string pool) that can be memory-mapped for sub-millisecond reverse geocoding queries.

Requires an indexed PBF (generated by pbfhogg cat). The output directory must not already exist unless --force is set.

pbfhogg build-geocode-index [OPTIONS] --output-dir <DIR> <FILE>
FlagDescription
--output-dir <DIR>Output directory for index files
--street-level <N>S2 cell level for streets/addresses [default: 17]
--coarse-level <N>Fallback cell level for rural areas [default: 14]
--admin-level <N>S2 cell level for admin boundaries [default: 10]
--max-admin-vertices <N>Douglas-Peucker vertex cap per admin polygon [default: 500]
--search-radius <M>Fine-level max search distance in meters [default: 75]
--coarse-search-radius <M>Coarse-level max search distance in meters [default: 1000]
--forceProceed without indexdata / overwrite existing index

Outputs 19 binary files. Denmark (465 MB PBF): ~7s, 172 MB index. Europe (32.4 GB): 524s (8.7 min), 7.5 GB RSS. Planet (87 GB): 1,255s (20.9 min), 29.5 GB peak RSS (pass-1.5 transient).

Released under the Apache License 2.0. | Copyright folk@folk.wtf