Although data portals offer a variety of tools to examine datasets, the ultimate “open” in “open data” is the ability to export the data to a file and do as one wishes with it.  However, this can get challenging with very large datasets, particularly over slower Internet connections.  It can also be inefficient if one needs only a portion of the data.

There are a number of techniques to attack this problem but a particularly good, yet frequently overlooked, one is to use the filtering at the core of a Data Lens page (often called a “Dashboard” on the Chicago Data Portal).

Using an example that comes up frequently in questions from our users, our Taxi Trips dataset contains over 107 million trips, as of this writing.  Downloading the full dataset in CSV format could take hours over even a fast connection and produce a very large file.

Suppose that you only needed the records from the last three months of 2016.  You may know that you can easily filter the Data Lens view to show those records.  Simply hover over the October 2016 slice on any of the time-based cards, click, drag across November and December, and release.  You have now selected approximately four million trips.

Image of a Data Lens card where October 2016 through December 2016 has been selected.

Using Data Lens to select three months of trips.

However, this is only the start.  Not only can you view those trips within the page — including how they affect other cards — but you can also export just those records.  To do so, click the Export button at the top of the page.  It will default to export all records but you can change that option to export only the currently selected records.

Screenshot of the Data Lens Export button, showing the option to download only selected records.

Data Lens Export button, showing the option to download only selected records.

You can even apply multiple filters.  If all you really need is trips from those three months originating in Logan Square, apply that filter, as well, on one of the map cards.

Screenshot showing filtering to just Logan Square pickups.

Filtering by Community Area

The download now becomes a very-manageable 24,963 records that download in a matter of minutes or seconds to a file under 10 MB in size.

As mentioned above, there are other ways to filter a large dataset.  For some more-complicated needs, it may be helpful to use them and they can be very powerful.  However, to make one or a few quick slices of a dataset and download the results, it is hard to beat the convenience of the graphical filters in a Data Lens page.

Feature image by luckey_sun and is licensed under Creative Commons Attribution-ShareAlike 2.0 Generic (CC BY-SA 2.0).

Tagged with →  
Share →