Create Extracts for IPUMS Microdata Collections
Below we provide examples in curl showing how to work with the IPUMS API to create and manage data extracts for the IPUMS microdata collections supported by the API (IPUMS USA, CPS, International, ATUS, AHTUS, MTUS, MEPS, and NHIS). Our examples are for IPUMS CPS but they will function the same for any microdata collection - simply update the collection=
string in the API URL and swap out variable and sample names from those data collections instead.
Get your key from your IPUMS user account management page at https://account.ipums.org/api_keys.
Load Libraries and Set Key
# set the IPUMS_API_KEY environment variable using bash shell
export IPUMS_API_KEY=YOUR_API_KEY_HERE
Submit a Data Extract Request
To submit a data extract request, you will construct a JSON payload manually (if you are not using one of the aforementioned R or Python client libraries). Once you have your request formed, you will then submit it to the API.
The names to use for samples and variables in the data extract request can be discovered on our website. See Explore IPUMS Microdata Collection Metadata for more information.
curl
program in a bash
command line environment to obtain a subset of variables from the 2018 and 2019 CPS ASEC data.# construct the JSON payload manually and submit it
curl --location --request POST 'https://api.ipums.org/extracts?collection=cps&version=2' \
--header "Authorization: $IPUMS_API_KEY" \
--header 'Content-Type: application/json' \
--data-raw '{
"description": "Example extract",
"dataStructure": {
"rectangular": {
"on": "P"
}
},
"dataFormat": "fixed_width",
"samples": {
"cps2018_03s": {},
"cps2019_03s": {}
},
"variables":{
"AGE": {},
"SEX": {},
"RACE": {},
"STATEFIP": {},
"MARST": {
"caseSelections": {
"general": [
"1",
"2",
"3",
"5"
]
}
},
"EDUC": {
"dataQualityFlags": true
},
"BPL": {
"attachedCharacteristics": [ "mother", "father", "spouse", "head" ]
}
}
}'
# A successful request will return a response that includes an extract number in the number attribute:
{
"number": 1,
"status": "queued",
"email": "MY_IPUMS_ACCOUNT@EMAIL.ADDRESS",
"downloadLinks": {},
"extractDefinition": {
"version": 2,
"dataStructure": {
"rectangular": {
"on": "P"
}
},
"dataFormat": "fixed_width",
"caseSelectWho": "individuals",
"description": "Example extract",
"samples": {
"cps2018_03s": {},
"cps2019_03s": {}
},
"variables": {
"YEAR": {
"preselected": true
},
"SERIAL": {
"preselected": true
},
"MONTH": {
"preselected": true
},
"CPSID": {
"preselected": true
},
"ASECFLAG": {
"preselected": true
},
"ASECWTH": {
"preselected": true
},
"STATEFIP": {},
"PERNUM": {
"preselected": true
},
"CPSIDP": {
"preselected": true
},
"CPSIDV": {
"preselected": true
},
"ASECWT": {
"preselected": true
},
"AGE": {},
"SEX": {},
"RACE": {},
"MARST": {
"caseSelections": {
"general": [
"1",
"2",
"3",
"5"
]
}
},
"BPL": {
"attachedCharacteristics": [
"head",
"mother",
"mother2",
"father",
"father2",
"spouse"
]
},
"EDUC": {
"dataQualityFlags": true
}
},
"collection": "cps"
}
}
Hierarchical Extracts
You can also submit hierarchical extracts.
# construct the JSON payload manually and submit it
curl --location --request POST 'https://api.ipums.org/extracts?collection=cps&version=2' \
--header "Authorization: $IPUMS_API_KEY" \
--header 'Content-Type: application/json' \
--data-raw '{
"description": "Example hierarchical extract",
"dataStructure": {
"hierarchical": {}
},
"dataFormat": "fixed_width",
"samples": {
"cps2018_03s": {},
"cps2019_03s": {}
},
"variables": {
"AGE": {},
"SEX": {},
"RACE": {},
"STATEFIP": {}
}
}'
# A successful request will return a response that includes an extract number in the number attribute:
{
"number": 30,
"status": "queued",
"email": "MY_IPUMS_ACCOUNT@EMAIL.ADDRESS",
"downloadLinks": {},
"extractDefinition": {
"version": 2,
"dataStructure": {
"hierarchical": {}
},
"dataFormat": "fixed_width",
"caseSelectWho": "individuals",
"description": "Example hierarchical extract",
"samples": {
"cps2018_03s": {},
"cps2019_03s": {}
},
"variables": {
"RECTYPE": {
"preselected": true
},
"YEAR": {
"preselected": true
},
"SERIAL": {
"preselected": true
},
"MONTH": {
"preselected": true
},
"CPSID": {
"preselected": true
},
"ASECFLAG": {
"preselected": true
},
"ASECWTH": {
"preselected": true
},
"STATEFIP": {},
"PERNUM": {
"preselected": true
},
"CPSIDP": {
"preselected": true
},
"CPSIDV": {
"preselected": true
},
"ASECWT": {
"preselected": true
},
"AGE": {},
"SEX": {},
"RACE": {}
},
"collection": "cps"
}
}
Using Time Use Variables and Selecting Sample Members
For the Time Use data collections (ATUS, AHTUS, MTUS) you can reference previously-defined time use variables (the API doesn’t yet support the creation of new time use variables). This works for both system time use variables and user-defined time use variables.
IPUMS ATUS also allows users to select sample members to include in their extract - not just Respondents but also
Household Members and/or Non-Respondents. Additionally, IPUMS ATUS supports the inclusion of Eldercare records in
hierarchical data extracts. To include Eldercare records in your IPUMS ATUS extract, include at least one Eldercare
variable in the variables
section of your extract request, and be sure to specify a “hierarchical” dataStructure
.
The following example illustrates the ATUS-specific features (Sample members selection and Eldercare variable inclusion, demonstrated in example below by the Eldercare variable “ECAGE”) as well as features common to all time use data collections (support for system and user-defined time use variable selection).
# construct the JSON payload manually and submit it
curl --location --request POST 'https://api.ipums.org/extracts?collection=atus&version=2' \
--header "Authorization: $IPUMS_API_KEY" \
--header 'Content-Type: application/json' \
--data-raw '{
"description": "TUV Example",
"dataFormat": "fixed_width",
"dataStructure": {
"hierarchical": {}
},
"samples": {
"at2017": {}
},
"timeUseVariables": {
"BLS_PCARE": {},
"my_tuv_1": {
"owner": "MY_IPUMS_ACCOUNT@EMAIL.ADDRESS"
}
},
"variables": {
"ECAGE": {}
},
"sampleMembers": {
"includeNonRespondents": true,
"includeHouseholdMembers": true
},
"collection": "atus",
"version": 2
}'
Note that for your own user-defined variables, you also need to include the owner
parameter, since multiple users could use the same name for their time use variables. Note: the system will not allow you to request another owner’s time user variable at this time - the owner
requirement has been put in place to allow for the potential of future time user variable sharing functionality. For system time use variables such as BLS_PCARE the owner
parameter is not necessary.
A successful request will return a response that includes an extract number in the number attribute:
{
"number": 3,
"status": "queued",
"email": "MY_IPUMS_ACCOUNT@EMAIL.ADDRESS",
"downloadLinks": {},
"extractDefinition": {
"version": 2,
"dataStructure": {
"hierarchical": {}
},
"dataFormat": "fixed_width",
"caseSelectWho": "individuals",
"description": "TUV Example",
"samples": {
"at2017": {}
},
"variables": {
"YEAR": {
"preselected": true
},
"CASEID": {
"preselected": true
},
"SERIAL": {
"preselected": true
},
"PERNUM": {
"preselected": true
},
"LINENO": {
"preselected": true
},
"WT06": {
"preselected": true
},
"ECAGE": {}
},
"timeUseVariables": {
"BLS_PCARE": {},
"my_tuv_1": {
"owner": "MY_IPUMS_ACCOUNT@EMAIL.ADDRESS"
}
},
"sampleMembers": {
"includeNonRespondents": true,
"includeHouseholdMembers": true
},
"collection": "atus"
}
}
Checking a Request’s Status
After submitting your extract request, you can use the API to check status using the extract’s number.
curl --request GET 'https://api.ipums.org/extracts/1?collection=cps&version=2' --header 'Content-Type: application/json' --header "Authorization: $IPUMS_API_KEY"
# A successful request will provide a response object like below. The exact fields may vary depending on how far along the extract is in processing.
# You will get a status such as `queued`, `started`, `produced` `canceled`, `failed` or `completed` in the status field.
{
"number": 1,
"status": "completed",
"email": "MY_IPUMS_ACCOUNT@EMAIL.ADDRESS",
"downloadLinks": {
"rCommandFile": {
"url": "https://api.ipums.org/downloads/cps/api/v1/extracts/590142/cps_00001.R",
"bytes": 406,
"sha256": "ae5fc796b17d9c8bd95d6fd7ade5e4b27199f3e13e54c917102f96201a1d6998"
},
"basicCodebook": {
"url": "https://api.ipums.org/downloads/cps/api/v1/extracts/590142/cps_00001.cbk",
"bytes": 55343,
"sha256": "3d4c1442da57d5ee03dadf963a4eaf1afceb62bf648f67cfb926da661df985af"
},
"data": {
"url": "https://api.ipums.org/downloads/cps/api/v1/extracts/590142/cps_00001.dat.gz",
"bytes": 5878234,
"sha256": "de4914246b7066c6991119fe2080153c21cfda4b8533e4e544db2f44d54ceebf"
},
"stataCommandFile": {
"url": "https://api.ipums.org/downloads/cps/api/v1/extracts/590142/cps_00001.do",
"bytes": 77351,
"sha256": "da4470612f86656d6683ddcb61c9c756a273a4fa97c6e0e5c552d7e8115308e3"
},
"sasCommandFile": {
"url": "https://api.ipums.org/downloads/cps/api/v1/extracts/590142/cps_00001.sas",
"bytes": 36625,
"sha256": "d7979bb51665d71187dbec4cc3a28c1c51ba7c4dd0d3706da7c5e7d2f3db5ccb"
},
"spssCommandFile": {
"url": "https://api.ipums.org/downloads/cps/api/v1/extracts/590142/cps_00001.sps",
"bytes": 38535,
"sha256": "fa3e8e72ff3df4542f38ef65d530c9ae532f7815d85a7d50810042121d65b11e"
},
"ddiCodebook": {
"url": "https://api.ipums.org/downloads/cps/api/v1/extracts/590142/cps_00001.xml",
"bytes": 148369,
"sha256": "73ee163a565498c3187e8f8da5316f07c9a75cbd534dd2b794c2f8590784b5ae"
}
},
"extractDefinition": {
"version": 2,
"dataStructure": {
"rectangular": {
"on": "P"
}
},
"dataFormat": "fixed_width",
"caseSelectWho": "individuals",
"description": "Example extract",
"samples": {
"cps2018_03s": {},
"cps2019_03s": {}
},
"variables": {
"YEAR": {
"preselected": true
},
"SERIAL": {
"preselected": true
},
"MONTH": {
"preselected": true
},
"CPSID": {
"preselected": true
},
"ASECFLAG": {
"preselected": true
},
"ASECWTH": {
"preselected": true
},
"STATEFIP": {},
"PERNUM": {
"preselected": true
},
"CPSIDP": {
"preselected": true
},
"CPSIDV": {
"preselected": true
},
"ASECWT": {
"preselected": true
},
"AGE": {},
"SEX": {},
"RACE": {},
"BPL": {
"attachedCharacteristics": [
"head",
"mother",
"mother2",
"father",
"father2",
"spouse"
]
},
"EDUC": {
"dataQualityFlags": true
}
},
"collection": "cps"
}
}
Retrieving Your Extract
To retrieve a completed extract, we will once again do so with the API using the extract’s number.
# download the data file using link that came back in extract request status object once completed
curl -H "Authorization: $IPUMS_API_KEY" https://api.ipums.org/downloads/cps/api/v1/extracts/1234567/cps_00001.dat.gz > my_ipums_cps_extract_1_dat.gz
# repeat for the other files e.g. codebook etc...
Now you are ready for further processing and analysis as you desire.
Get a Listing of Recent Extract Requests
You may also find it useful to get a historical listing of your extract requests.
curl -X GET \
https://api.ipums.org/extracts?collection=cps&version=2 \
-H 'Content-Type: application/json' \
-H "Authorization: $IPUMS_API_KEY"
# If you omit an extract number in your API call, by default this will return the 10 most recent extract requests. To adjust the amount returned, you may optionally specify a `?limit=##` parameter to get the ## most recent extracts instead.