The trail processing pipeline I’ve been building started with a simple approach to elevation: send coordinates to the Open-Topo-Data API and get elevations back. It worked fine for a handful of trails. With 624 trails and hundreds of sampled points each, it became the bottleneck — slow, rate-limited, and dependent on an external service I don’t control.
The fix was to stop using an API entirely and query SRTM tiles directly.
What SRTM Data Is
SRTM (Shuttle Radar Topography Mission) is a NASA dataset from a 2000 Space Shuttle mission that measured Earth’s elevation using radar interferometry. The 1-arc-second (~30 meter) resolution dataset covers most of the planet and is public domain. It’s distributed as .hgt binary files, one per 1°×1° tile, named by their southwest corner: N37W122.hgt covers the 1° tile starting at 37°N, 122°W.
Each file is a 3601×3601 grid of signed 16-bit integers representing elevation in meters. Reading a point is a direct array lookup once you know the tile and the fractional position within it.
Reading an HGT File
import numpy as np
import struct
SAMPLES = 3601 # 1-arc-second resolution
def load_hgt(path: str) -> np.ndarray:
with open(path, "rb") as f:
data = np.frombuffer(f.read(), dtype=">i2") # big-endian int16
return data.reshape((SAMPLES, SAMPLES))
def elevation_at(grid: np.ndarray, lat: float, lon: float) -> float:
tile_lat = int(lat)
tile_lon = int(lon)
row = SAMPLES - 1 - int((lat - tile_lat) * (SAMPLES - 1))
col = int((lon - tile_lon) * (SAMPLES - 1))
return float(grid[row, col])
The tile name encodes the southwest corner, and the grid is stored north-to-south, so the row index is inverted relative to latitude.
Smoothing Out SRTM Noise
Raw SRTM data has jitter — a flat trail can show 100+ meters of cumulative elevation gain just from pixel-to-pixel noise. A 5-sample centered moving average removes the spikes without introducing significant lag:
def smooth_elevations(elevations: list[float], window: int = 5) -> list[float]:
half = window // 2
smoothed = []
for i, _ in enumerate(elevations):
start = max(0, i - half)
end = min(len(elevations), i + half + 1)
smoothed.append(sum(elevations[start:end]) / (end - start))
return smoothed
After smoothing, the same flat trail reports ~19 meters of gain — much closer to reality.
Caching Elevation Results
HGT files are ~26 MB each. Loading and querying them is fast, but computing elevations for 600+ trails still takes a few seconds on a full rebuild. The pipeline only needs to recompute elevations when a trail’s coordinates actually change.
A SHA-256 hash of the sampled coordinate list serves as a cache key:
import hashlib, json
def coord_hash(coords: list[tuple[float, float]]) -> str:
payload = json.dumps(coords, separators=(",", ":"))
return hashlib.sha256(payload.encode()).hexdigest()[:16]
The cache is a JSON file mapping hashes to {elevations, gain, loss}. On a rebuild where no trail geometries changed, every trail hits the cache and the elevation phase takes milliseconds instead of seconds.
The Payoff
Before: ~3–5 seconds per trail, rate-limit sleeps between batches, occasional failures requiring retries.
After: ~6ms per trail on average, fully offline, deterministic. The entire 624-trail dataset processes in under 10 seconds including SRTM file loads.
The HGT files download on demand the first time a tile is needed and are cached permanently alongside the project data. The total SRTM tile storage for the East Bay coverage area is about 26 MB — negligible.
The one real cost is tile management: if the project expands to cover new regions, new HGT files need to be present. But that’s a download-once problem, not an ongoing operational one. Compared to managing API keys and rate limits across build environments, it’s a straightforward tradeoff.