Ditching the Elevation API: Local SRTM Tiles

The trail processing pipeline I’ve been building started with a simple approach to elevation: send coordinates to the Open-Topo-Data API and get elevations back. It worked fine for a handful of trails. With 624 trails and hundreds of sampled points each, it became the bottleneck — slow, rate-limited, and dependent on an external service I don’t control.

The fix was to stop using an API entirely and query SRTM tiles directly.

What SRTM Data Is

SRTM (Shuttle Radar Topography Mission) is a NASA dataset from a 2000 Space Shuttle mission that measured Earth’s elevation using radar interferometry. The 1-arc-second (~30 meter) resolution dataset covers most of the planet and is public domain. It’s distributed as .hgt binary files, one per 1°×1° tile, named by their southwest corner: N37W122.hgt covers the 1° tile starting at 37°N, 122°W.

Each file is a 3601×3601 grid of signed 16-bit integers representing elevation in meters. Reading a point is a direct array lookup once you know the tile and the fractional position within it.

Reading an HGT File

import numpy as np
import struct

SAMPLES = 3601  # 1-arc-second resolution

def load_hgt(path: str) -> np.ndarray:
    with open(path, "rb") as f:
        data = np.frombuffer(f.read(), dtype=">i2")  # big-endian int16
    return data.reshape((SAMPLES, SAMPLES))

def elevation_at(grid: np.ndarray, lat: float, lon: float) -> float:
    tile_lat = int(lat)
    tile_lon = int(lon)
    row = SAMPLES - 1 - int((lat - tile_lat) * (SAMPLES - 1))
    col = int((lon - tile_lon) * (SAMPLES - 1))
    return float(grid[row, col])

The tile name encodes the southwest corner, and the grid is stored north-to-south, so the row index is inverted relative to latitude.

Smoothing Out SRTM Noise

Raw SRTM data has jitter — a flat trail can show 100+ meters of cumulative elevation gain just from pixel-to-pixel noise. A 5-sample centered moving average removes the spikes without introducing significant lag:

def smooth_elevations(elevations: list[float], window: int = 5) -> list[float]:
    half = window // 2
    smoothed = []
    for i, _ in enumerate(elevations):
        start = max(0, i - half)
        end = min(len(elevations), i + half + 1)
        smoothed.append(sum(elevations[start:end]) / (end - start))
    return smoothed

After smoothing, the same flat trail reports ~19 meters of gain — much closer to reality.

Caching Elevation Results

HGT files are ~26 MB each. Loading and querying them is fast, but computing elevations for 600+ trails still takes a few seconds on a full rebuild. The pipeline only needs to recompute elevations when a trail’s coordinates actually change.

A SHA-256 hash of the sampled coordinate list serves as a cache key:

import hashlib, json

def coord_hash(coords: list[tuple[float, float]]) -> str:
    payload = json.dumps(coords, separators=(",", ":"))
    return hashlib.sha256(payload.encode()).hexdigest()[:16]

The cache is a JSON file mapping hashes to {elevations, gain, loss}. On a rebuild where no trail geometries changed, every trail hits the cache and the elevation phase takes milliseconds instead of seconds.

The Payoff

Before: ~3–5 seconds per trail, rate-limit sleeps between batches, occasional failures requiring retries.

After: ~6ms per trail on average, fully offline, deterministic. The entire 624-trail dataset processes in under 10 seconds including SRTM file loads.

The HGT files download on demand the first time a tile is needed and are cached permanently alongside the project data. The total SRTM tile storage for the East Bay coverage area is about 26 MB — negligible.

The one real cost is tile management: if the project expands to cover new regions, new HGT files need to be present. But that’s a download-once problem, not an ongoing operational one. Compared to managing API keys and rate limits across build environments, it’s a straightforward tradeoff.