PMsquare

View Original

Pull U.S. Census Data Using Cognos Analytics via the Bureau's API

The 2020 US elections. Emergency responses to the ever-expanding hurricane and wildfire seasons. The COVID-19 pandemic. What do these newsworthy and oft-visualized subject areas all have in common? Integrated US Census Bureau data. The demographic, economic, housing, and social data collected from Census Bureau surveys and programs are often intertwined with other data to provide context. Knowing how to integrate this public data with your own is a skill worth honing.

Mandated by Article 1, Section 2 of the US Constitution, the decennial US census is meant to count every resident in the United States. This, of course, drives representation in the US House of Representatives and determines where federal funds are distributed. Though this particular and most well-known form of data collection happens only once every ten years, the Census Bureau has many other continuous cycle surveys, programs, and censuses going to enable more frequent collection of key data about the American people and the economy. A good starting point to understanding the various methods of collecting and analyzing this data about the United States can be found at the US Census Bureau’s program site. There are in-browser data exploration tools, visualization tools, and data download features available for many of the surveys. One great example is data.census.gov, where you can use natural language to interact with census data. For some use cases, using the interactive browser based exploration and download tools may be all that you need to manually get to the data you require.

When you need to integrate census data with your own data, you must either manually download data sets in piecemeal/adhoc fashion using Census Bureau tools (like data.census.gov), or you could use the Census Bureau API to predictably interact with data sets and pull data directly from the source in a repeatable, programmatic way. The Census Bureau has very good API documentation published and even has a Census Data API Discovery Tool available (HTML, XML, or JSON) to help get you started understanding what data exists and where. Any user may query small quantities of data from the Census Bureau with minimal restrictions (up to 50 variables in a single query and up to 500 queries per IP address per day). However, more than 500 queries per IP address per day requires that you register for an API key. To request an API key, click here

A common US Census data API target is the American Community Survey (ACS). The ACS is an ongoing survey that provides data every year. For example, if you wanted to know what percentage of people 24 years old and older in a particular city hold 4-year engineering or science degrees, the ACS can tell you that. It’s important to remember that all ACS data are estimates. The Census Bureau collects data from a sample of the population then provides estimates along with a published margin of error. When deciding which ACS data set to use, you must consider the balance between currency and sample size/reliability/precision. The summary below should help with initial decision making on which ACS data set to use, if the ACS is indeed relevant to your needs.

To pull this census data using the API into our Cognos Analytics environment, for future integration with our own corporate data, we’d ideally like to use the Progress DataDirect Autonomous REST connection that is available in Cognos Analytics version 11.1.7. This data server type live queries JSON responses that are returned from API endpoints accessed via HTTP. It is exactly what we need to connect and pull data from the Census Bureau APIs, except that there’s currently a defect with the Progress DataDirect driver that doesn’t allow us to work with the US Census API due to its nonstandard JSON format response (more on that later). With this connector, we generally have the ability to either automagically infer a tabular schema (hence the ‘autonomous’ in the name) or specify the configuration with a predefined schema using an input REST file. We can also filter or connect to multiple endpoints in a single connection and specify other customizations when using the explicit input REST file method.  For more info on how to specify a configuration rather than automagically sniffing out the schema (when possible), see the Progress DataDirect documentation along with the ACS Census Data API handbook

Concerning the current defect, it is present in 11.1.7 versions up to FP1 and it prevents us from effectively using the Progress DataDirect Autonomous REST connector for the US Census API in our Cognos environments. The defect is currently being addressed. We can expect this to get well very soon either in an upcoming fix pack, interim fix, or release.  PMsquare will keep an eye on this and will update our community as we learn more of Progress and IBM’s headway in implementing a fix. The crux of the problem is primarily due to the nonstandard JSON response from the US Census API (see image below). The Census API returns nonstandard JSON that uses nested arrays with values separated by a comma and headers on the first row rather than conventional JSON objects with key value pairs. Regardless, a fix is coming that will enable us to use the Progress DataDirect Autonomous REST connector to connect to this API, even given the nonstandard JSON response.

In a future post, once the defect is addressed, we’ll cover how to use the Progress DataDirect Autonomous REST connector in order to pull US Census data via the API into your Cognos environment in a simple, automated fashion. In the meantime, you could use the data discovery tools published by the US Census Bureau (referenced above) to find and manually download the data sets you need, then upload them into Cognos and create data modules from there. Not automated, but for use cases where infrequent census data is needed, it may work for now. There are also likely more automated ways that you could use Jupyter notebooks, some Python or R code (see Stanford’s Data Challenge Lab course’s approach to translating US Census data), and CaDataConnector.write_data to write translated US Census data results back to dataset in Cognos…but that’s too much heavy lifting for now. We’ll wait for the Progress DataDirect Autonomous REST connector fix, as that method is more sustainable and relatively simple.

To head start our next blog post on this topic, I’d like to share a little about the Sample method of the Progress DataDirect Autonomous REST connector in Cognos Analytics. When you can, you’ll often want to keep it simple and let the connector do its magic. Depending on the JSON response structure (complexity, headers, format, pagination, etc), sometimes you’ll be able to do just that. When able to keep it simple, you can use the Sample property in the Autonomous REST connector to specify a single endpoint and, at connection, allow the driver to sample the endpoint to infer a schema based on the results. Our connection URL would be constructed as follows:

jdbc:subprotocol:autorest:Sample=sample_path;[property=value[;...]];

Where…

  • subprotocol is the portion of the connection URL subprotocol that is determined by the publisher of your application. 

  • property=value specifies connection property settings. Multiple properties are separated by a semi-colon.

  • sample_path specifies the endpoint to sample when not using an input REST file (with special characters enclosed in quotes)

The Autonomous REST connector maps JSON objects to relational tables when using the Sample property in order to support SQL access to the REST service. It automagically generates a relational view of the data when the initial sampling has been completed. When generating the relational view, the driver decompounds JSON documents returned by the endpoints into parent-child tables. The driver handles the mapping in the following manner:

  • Simple and nested objects are flattened and mapped to a parent table

  • Arrays of objects and arrays of strings are mapped to related child tables

  • If a JSON map is detected, it is normalized into a child table

The only real way to see if this sample/autonomous method works for your API is to give it a shot. Set up your data server connection, load the AUTOREST schema, create a test data module, then view the results to see if the autonomously determined structures/results work for your purposes. For example, using the PLOS (The Public Library of Science) API, if we want to see information on all articles with ‘COVID’ in the title, we’d create a data source connection like this…

The autonomous REST connector does a pretty good job in this case of automagically turning the JSON response into a tabular schema that we can easily use for analytics….

For more depth on the Sample method using the autonomous REST connector, see my colleague Ryan Dolley’s recorded live stream on the PMsquare YouTube channel. If the Sample/autonomous method doesn’t work for your purposes (need to handle authentication more than API key, pagination, explicitly naming tables/columns, etc), you generally would want to specify an explicit input REST file to better dictate how the data is pulled and formed. To specify an input REST file, you would need to create one per the Progress DataDirect documentation and point to that file in the connection details. That connection string in Cognos for the Progress DataDirect Autonomous REST connector might look something like jdbc:ibmcognos:autorest:Config=d:\testinput.rest … with the testinput.rest file containing code like this to explicitly name table and columns:

{"test_table":{

"#path":[

"https://api.census.gov/data/2019/acs/acs1/subject?get=NAME,S1502_C02_003E&for=state:*"

],

"STATE_NAME":"VarChar(32)",

"PCT_4YR_SCIENG":"VarChar(32)",

"STATE_ID":"Integer,#key"

}

} 

…which would produce this…

Wrap-Up

That’s all for now. More to follow once that defect with the Progress DataDirect Autonomous REST connector is cured. PMsquare has helped customers successfully utilize data from many different sources using various methods. For help pulling US Census data into your Cognos Analytics implementation, contact one of our solutions architects at contact us.

Read Part 2 Now: Pull US Census Data Using Cognos Analytics via the Bureau’s API (Part 2: Manual Pull Boogaloo)

Next Steps

We hope you found this article informative. Be sure to subscribe to our newsletter for data and analytics news, updates, and insights delivered directly to your inbox.

If you have any questions or would like PMsquare to provide guidance and support for your analytics solution, contact us today.

See this gallery in the original post