Recently, we’ve been partnering with Hammerhead to design offline maps for their on-bike cycling computer. It’s been really interesting to work on a design project where size concerns (all the data must be downloaded to the device and MBs count) and hardware concerns (the styles must read in direct sunlight) are equally as important as aesthetics.
As part of this project, we needed to optimize and process global OSM data, converting it to the format used on the device and stripping out all the additional layers not used by the map style. The process uses GDAL, Osmosis, and PBF extracts downloaded from Geofabrik. We had already bundled it into a bash script that takes the name of an OSM area and:
- Downloads all the required data files
- Creates land and sea polygons and crops to the desired area
- Converts the OSM data
- Uploads the converted data to S3 along with a small text file defining the extents
After running a few multi-hour tests on single US states, it became clear it was going to take a week to complete the entire world running from my local machine… and we had to deliver in a couple of days. We weren’t thinking big enough! Obviously 1 computer wasn’t going to cut it, we needed 50, all more powerful than my stupid laptop. We had been working with Docker and DigitalOcean before, but mostly as a convenience way to not have to constantly rebuild server dependency. This seemed like a good opportunity to test their scalability and see how they could help us with dealing with a monster dataset.
Running the Images
Going forward, this work represents a strategy instead of a plug-and-play code library that we can reuse. Our big push in the last few years with our data work has been towards scriptability and repeatability, using code to handle every step of data processing from source to inside the application. What we learned on this project extends that scriptability to include not only the computer systems we run these projects on, but the level of scalability required to quickly process monster data jobs.