Skip to main content
p.—2026·06·06 · 20:55 CR
Insights
Technical teardownMay 31, 2026· 7 min read

Turning a PDF dump into a public atlas: how we built Atlas Criminalidad CR

Costa Rica’s crime data was open and unreadable at the same time. Here is the architecture that turned a quarterly stack of PDFs into a tool anyone can read in five minutes — and that keeps working every release.

The Organismo de Investigación Judicial (OIJ) publishes crime-incidence statistics for the whole country. The data is genuinely open. It is also, in practice, unreadable — and those two facts living together is the most common problem in civic data. This is how we resolved it for Atlas Criminalidad CR, and the design decisions that keep it honest every quarter.

Open data that wasn’t usable

The raw material arrives as PDFs and spreadsheets, split across reporting periods, with canton and district names spelled inconsistently between files. To answer a question as basic as “is theft rising in my canton?”, a citizen had to download several documents, reconcile them by hand, and trust their own arithmetic. Open data that takes an hour of manual reconciliation to read is, functionally, closed.

The temptation is to do a one-time clean: pull the latest files, fix them by hand, ship a dashboard. That produces a screenshot, not infrastructure. The OIJ publishes again next quarter, the hand-cleaning starts over, and the tool rots. The real specification was never “make a dashboard” — it was “make the next release flow in without a human touching it.”

One constraint above all: readable in five minutes

Before any code, we fixed a single standard that every later decision had to serve: a non-analyst should get from landing on the page to a defensible answer in under five minutes. That constraint is ruthless. It kills the 90-page report, the login wall, the “explore the data” maze with no starting point. It forces plain-language summaries over every chart and a default view that already answers the most common question.

Intelligence that cannot be consumed in fifteen minutes is intelligence that does not get used. For a public tool, the budget is five.

The architecture

The system is a one-way pipeline. Data moves from the source to a stable model to a rendered surface, and every stage exists to absorb the next release without breaking the five-minute promise.

Atlas Criminalidad CR — data flow
  1. 01Sources
    OIJ open data (PDF / spreadsheet)Official geography registry
  2. 02Ingestion
    Parse + cleanCanonical geo codesCategory mapping
  3. 03Model
    Geography × category × periodDerived rates & trends
  4. 04Surface
    Interactive mapsTemporal chartsPlain-language summaries
  5. 05Delivery
    Static build · edge CDNOne indexed public URL

1 · Ingestion that survives the next release

The first stage does the work that used to be manual, once, in code. Files are parsed into rows, geography is reconciled to canonical canton and district codes, and crime categories are mapped to a stable taxonomy. The output is a single normalized table with the same shape every quarter — so a new release is an input, not a project.

The normalized shape — stable across every release
type Incidence = {
  provinceId: number;   // canonical 1–7
  cantonId: number;     // canonical, 82 total
  category: CrimeCode;  // mapped to a stable taxonomy
  period: `${number}-${number}`; // YYYY-MM
  count: number;
};

The unglamorous part — name reconciliation — is where most civic-data tools quietly fail. “San José”, “San Jose”, and a trailing-space variant are three different keys to a computer and the same place to a person. Pinning every record to a numeric canonical code up front means nothing downstream has to guess.

2 · One model, two questions

A national newsroom and a single resident want opposite things from the same data: one needs to compare all 82 cantons, the other needs only theirs, over time. Both are the same query against one model — geography by category by period — so the tool answers a national-trend question and a single-canton question without a rebuild. Rates and trends are derived once, at model time, not recomputed in the browser.

3 · Rendering for the reader, not the analyst

A chart is not an answer. Every view in the atlas is wrapped in a sentence a non-analyst can act on, generated from the same model so it never drifts from the numbers it describes. The plain-language layer follows a few rules:

  • Lead with the direction and the magnitude, then the number — “up sharply (+38%)”, not a bare figure.
  • Always name the comparison: against last year, against the national rate, against the neighbouring canton.
  • Never imply causation the data can’t support. The atlas reports incidence; it does not explain it.

What it costs to keep honest

Because the whole thing renders to a static build served from the edge, there is no server to fall over and the page is fast on a mid-range phone on mobile data — the device most Costa Ricans will actually open it on. The ongoing cost is not infrastructure; it is the discipline of running every new OIJ release through the same pipeline and re-publishing, instead of patching the output. That discipline is the difference between an atlas and a screenshot.

The lesson

The value was never in collecting the data — the OIJ already did that, and well. The value was in making it legible, and in building the pipeline so legibility survives the next release. That is the whole thesis of the studio in one project: most software is optimized for the demo; this was optimized for its third year. You can read the result for yourself.

Liveatlas-criminalidad-cr.vercel.appOpen in new tab

Want this kind of system built — and documented?

Describe the problem in two sentences. We reply within 24 hours.

One technical teardown per issue. No sequences, no spam.