Last year the City’s data science team released the first version of RSocrata, which allows an easy way for R programmers to access and download data from Socrata data portals using the R statistical language. This week, we’ve released RSocrata 1.6.0, which introduces some new features for users.
Users can now quickly download a list of all datasets from a Socrata open data portal using ls.socrata (short for “list space Socrata”). Use the domain on an open data portal, such as data.cityofchicago.org, data.hawaii.org, iranhumanrights.socrata.com, or any other portal hosted by Socrata:
all_the_data <- ls.socrata("data.cityofchicago.org") nrow(allSitesDataFrame) # Number of datasets allSitesDataFrame$title # Names of each dataset
Users can navigate through the entire list of datasets on the portal, a brief description, download URLs, and more. It’s a quick and easy way to view the available data on the portal and also write scripts to download each dataset (using the read.socrata function).
This feature uses the Project Open Data Metadata Schema—otherwise known as data.json—standard (currently compatible with v1.1). The schema is becoming the de facto standard for transmitting metadata, which serves as the basis for the ls.socrata function. This new feature was conceived, written and submitted by Peter Schmiedeskamp from the University of Washington.
Heavy users of Socrata should use API tokens to allow for more API requests without being throttled. RSocrata now supports a separate API token field. Users can simply use that optional field to pass along their token and reduce download throttling.
token <- "ew2rEMuESuzWPqMkyPfOSGJgE" earthquakesDataFrame <- read.socrata("http://soda.demo.socrata.com/resource/4334-bgaj.csv", app_token = "ew2rEMuESuzWPqMkyPfOSGJgE")
Want to hide your API keys on a public project, such as on GitHub? Now you can use the app_token to keep your private token from other users. Create a new file in your project called token.txt with the following content:
You can read-in the token using readLines
token <- readLines(“path/to/token.txt”, n=1) read.socrata("http://soda.demo.socrata.com/resource/4334-bgaj.csv", app_token = token)
To mask your token, add token.txt to your .gitignore file and, voila, you’ve hidden the token.
RSocrata 1.6.0 is available on CRAN for R 3.2.0 or greater and can be downloaded using:
Further development will be conducted on the project’s GitHub site. You can install the beta using devtools the devtools package:
install.packages("devtools") library(devtools) install_github(“Chicago/RSocrata”)