Documentation > Workflows & Code > Create Extracts > NHGIS

Create IPUMS NHGIS Data Extracts

Below we provide examples in curl showing how to create and manage NHGIS data extracts with the IPUMS API.

If you are an R or Python user, please note that we also provide language-native client libraries to make it easier to work with the IPUMS API using those languages.

Get your key from https://account.ipums.org/api_keys. Make sure to replace ‘MY_KEY’ (all caps) in the snippet below with your key.

Load Libraries and Set Key

export MY_KEY=MY_KEY # set the MY_KEY environment variable using bash shell

Submit a Data Extract Request

To submit a data extract request you need to pass a valid JSON-formatted extract request in the body of your POST. The names to use for values in the data extract request can be discovered via our metadata API endpoints.

Data Extract Request Fields

  • datasets: An object where each key is the name of the requested dataset and each value is another object describing your selections for that dataset.
    • dataTables: (Required) A list of selected data table names.
    • geogLevels: (Required) A list of selected geographic level names.
    • years: A list of selected years. To select all years use ["*"]. Only required when the dataset has multiple years.
    • breakdownValues: A list of selected breakdown values. Defaults to first breakdown value. If more than one is selected, then specify breakdownAndDataTypeLayout at the root of the request body.
  • timeSeriesTables: An object where each key is the name of the requested time series table and each value is another object describing your selections for that time series table.
    • geogLevels: (Required) A list of selected geographic level names.
    • years: A list of selected years.
  • shapefiles: A list of selected shapefiles.
  • description: A short description of your extract.
  • dataFormat: The requested format of your data. Valid choices are: csv_no_header, csv_header, and fixed_width. csv_header adds a second, more descriptive header row. Contrary to the name, csv_no_header still provides a minimal header in the first row. Required when any datasets or timeSeriesTables are selected.
  • breakdownAndDataTypeLayout: The layout of your dataset data when multiple data types or breakdown combos are present. Valid choices are: separate_files (split up each data type or breakdown combo into its own file) and single_file (keep all datatypes and breakdown combos in one file). Required when a dataset has multiple breakdowns or data types.
  • timeSeriesTableLayout: The layout of your time series table data. Valid choices are: time_by_column_layout, time_by_row_layout, and time_by_file_layout. Required when any time series tables are selected. See the NHGIS documentation for more information.
  • geographicExtents: A list of geographic_instances to use as extents for all datasets and/or time series tables in this request. Only applies to geographic levels where has_geog_extent_selection is true. If not specified, all extents are selected. Alternatively, to select all extents, use ["*"].
curl -X POST \
  "https://api.ipums.org/extracts/?collection=nhgis&version=2" \
  -H "Content-Type: application/json" \
  -H "Authorization: $MY_KEY" \
  -d '
{
  "datasets": {
    "1988_1997_CBPa": {
      "years": ["1988", "1989", "1990", "1991", "1992", "1993", "1994"],
      "breakdownValues": ["bs30.si0762", "bs30.si2026"],
      "dataTables": [
        "NT001"
      ],
      "geogLevels": [
        "county"
      ]
    },
    "2000_SF1b": {
      "dataTables": [
        "NP001A"
      ],
      "geogLevels": [
        "blck_grp"
      ]
    }
  },
  "timeSeriesTables": {
    "A00": {
      "geogLevels": [
        "state"
      ],
      "years": [
        "1990"
      ]
    }
  },
  "shapefiles": [
    "us_state_1790_tl2000"
  ],
  "timeSeriesTableLayout": "time_by_file_layout",
  "geographicExtents": ["010"],
  "dataFormat": "csv_no_header",
  "description": "example extract request",
  "breakdownAndDataTypeLayout": "single_file"
}
'

# returns a JSON response like: 

{"number":6,"status":"queued","downloadLinks":null,"extractDefinition":{"dataFormat":"csv_no_header","description":"example extract request","datasets":{"1988_1997_CBPa":{"years":["1988","1989","1990","1991","1992","1993","1994"],"breakdownValues":["bs30.si0762","bs30.si2026"],"dataTables":["NT001"],"geogLevels":["county"]},"2000_SF1b":{"dataTables":["NP001A"],"geogLevels":["blck_grp"]}},"timeSeriesTables":{"A00":{"geogLevels":["state"],"years":["1990"]}},"timeSeriesTableLayout":"time_by_file_layout","shapefiles":["us_state_1790_tl2000"],"geographicExtents":["010"],"breakdownAndDataTypeLayout":"single_file"},"errors":{}}

Get a Request’s Status

After submitting your extract request, you can use the extract number to retrieve the request’s status. Here we’re retrieving status for extract number 6.

curl -X GET "http://api.ipums.org/extracts/6?collection=nhgis&version=2" -H "Authorization: $MY_KEY"

## response:

{
    "number": 6,
    "status": "completed",
    "email": "email@example.com",
    "downloadLinks": {
        "codebookPreview": {
            "url": "https://api.ipums.org/downloads/nhgis/api/v1/extracts/9123456/nhgis0006_csv_PREVIEW.zip",
            "bytes": 18665,
            "sha256": null
        },
        "tableData": {
            "url": "https://api.ipums.org/downloads/nhgis/api/v1/extracts/9123456/nhgis0006_csv.zip",
            "bytes": 124203,
            "sha256": null
        },
        "gisData": {
            "url": "https://api.ipums.org/downloads/nhgis/api/v1/extracts/9123456/nhgis0006_shape.zip",
            "bytes": 11605209,
            "sha256": null
        }
    },
    "extractDefinition": {
        "dataFormat": "csv_no_header",
        "description": "example extract request",
        "datasets": {
            "1988_1997_CBPa": {
                "dataTables": ["NT001"],
                "geogLevels": ["county"],
                "years": ["1988", "1989", "1990", "1991", "1992", "1993", "1994"],
                "breakdownValues": ["bs30.si0762", "bs30.si2026"]
            },
            "2000_SF1b": {
                "dataTables": ["NP001A"],
                "geogLevels": ["blck_grp"]
            }
        },
        "timeSeriesTables": {
            "A00": {
                "geogLevels": ["state"],
                "years": ["1990"]
            }
        },
        "timeSeriesTableLayout": "time_by_file_layout",
        "shapefiles": ["us_state_1790_tl2000"],
        "geographicExtents": ["010"],
        "breakdownAndDataTypeLayout": "single_file",
        "version": 2,
        "collection": "nhgis"
    },
    "errors": {}
}

You will get a status such as queued, started, produced canceled, failed or completed.

Retrieving Your Extract

To retrieve a completed extract (using extract number 6 as the example again):

  1. Using the request status query above, wait until the status is completed.
  2. Extract the download URL from the response, which is in the downloadLinks attribute:
curl -X GET \
  "https://api.ipums.org/extracts/6?collection=nhgis&version=2" \
  -H "Authorization: $MY_KEY"

The downloadLinks portion of the response will look like:

"downloadLinks": {
        "codebookPreview": {
            "url": "https://api.ipums.org/downloads/nhgis/api/v1/extracts/9123456/nhgis0006_csv_PREVIEW.zip",
            "bytes": 18665,
            "sha256": null
        },
        "tableData": {
            "url": "https://api.ipums.org/downloads/nhgis/api/v1/extracts/9123456/nhgis0006_csv.zip",
            "bytes": 124203,
            "sha256": null
        },
        "gisData": {
            "url": "https://api.ipums.org/downloads/nhgis/api/v1/extracts/9123456/nhgis0006_shape.zip",
            "bytes": 11605209,
            "sha256": null
        }
    },

Next, retrieve the file(s) from the URL. You will need to pass the Authorization header with your API key to the server in order to download the data.

curl -H "Authorization: $MY_KEY" "https://api.ipums.org/downloads/nhgis/api/v1/extracts/9123456/nhgis0006_csv.zip" > mydata.zip

Now you are ready for further processing and analysis as you desire.

Get a Listing of Recent Extract Requests

You may also find it useful to get a historical listing of your extract requests. If you omit an extract number in your API call, by default this will return the 10 most recent extract requests. To adjust the amount returned, you may optionally specify a ?limit=## parameter to get the ## most recent extracts instead.

curl -X GET \
  "https://api.ipums.org/extracts?collection=nhgis&version=2" \
  -H "Authorization: $MY_KEY"