Documentation > Workflows & Code > Create Extracts > Microdata Collections

Create Extracts for IPUMS Microdata Collections

Below we provide examples in curl showing how to work with the IPUMS API to create and manage data extracts for the IPUMS microdata collections supported by the API (IPUMS USA, CPS, International, ATUS, AHTUS, MTUS, MEPS, and NHIS). Our examples are for IPUMS CPS but they will function the same for any microdata collection - simply update the collection= string in the API URL and swap out variable and sample names from those data collections instead.

If you are an R or Python user, please note that we also provide language-native client libraries to make it easier to work with the IPUMS API using those languages.

Get your key from your IPUMS user account management page at https://account.ipums.org/api_keys.

Load Libraries and Set Key

# set the IPUMS_API_KEY environment variable using bash shell
export IPUMS_API_KEY=YOUR_API_KEY_HERE

Submit a Data Extract Request

To submit a data extract request, you will construct a JSON payload manually (if you are not using one of the aforementioned R or Python client libraries). Once you have your request formed, you will then submit it to the API.

The names to use for samples and variables in the data extract request can be discovered on our website. See Explore IPUMS Microdata Collection Metadata for more information.

The following code uses the curl program in a bash command line environment to obtain a subset of variables from the 2018 and 2019 CPS ASEC data.
# construct the JSON payload manually and submit it 
curl --location --request POST 'https://api.ipums.org/extracts?collection=cps&version=2' \
--header "Authorization: $IPUMS_API_KEY" \
--header 'Content-Type: application/json' \
--data-raw '{
  "description": "Example extract",
  "dataStructure": {
    "rectangular": {
      "on": "P"
    }
  },
  "dataFormat": "fixed_width",
  "samples": {
    "cps2018_03s": {},
    "cps2019_03s": {}
  },
  "variables":{
    "AGE": {},
    "SEX": {},
    "RACE": {},
    "STATEFIP": {},
    "MARST": {
      "caseSelections": {
        "general": [
          "1",
          "2",
          "3",
          "5"
        ]
      }
    },
    "EDUC": {
      "dataQualityFlags": true
    },
    "BPL": {
      "attachedCharacteristics": [ "mother", "father", "spouse", "head" ]
    }
  }
}'

# A successful request will return a response that includes an extract number in the number attribute:

{
  "number": 1,
  "status": "queued",
  "email": "MY_IPUMS_ACCOUNT@EMAIL.ADDRESS",
  "downloadLinks": {},
  "extractDefinition": {
    "version": 2,
    "dataStructure": {
      "rectangular": {
        "on": "P"
      }
    },
    "dataFormat": "fixed_width",
    "caseSelectWho": "individuals",
    "description": "Example extract",
    "samples": {
      "cps2018_03s": {},
      "cps2019_03s": {}
    },
    "variables": {
      "YEAR": {
        "preselected": true
      },
      "SERIAL": {
        "preselected": true
      },
      "MONTH": {
        "preselected": true
      },
      "CPSID": {
        "preselected": true
      },
      "ASECFLAG": {
        "preselected": true
      },
      "ASECWTH": {
        "preselected": true
      },
      "STATEFIP": {},
      "PERNUM": {
        "preselected": true
      },
      "CPSIDP": {
        "preselected": true
      },
      "CPSIDV": {
        "preselected": true
      },
      "ASECWT": {
        "preselected": true
      },
      "AGE": {},
      "SEX": {},
      "RACE": {},
      "MARST": {
        "caseSelections": {
          "general": [
            "1",
            "2",
            "3",
            "5"
          ]
        }
      },
      "BPL": {
        "attachedCharacteristics": [
          "head",
          "mother",
          "mother2",
          "father",
          "father2",
          "spouse"
        ]
      },
      "EDUC": {
        "dataQualityFlags": true
      }
    },
    "collection": "cps"
  }
}

Hierarchical Extracts

You can also submit hierarchical extracts.

# construct the JSON payload manually and submit it
curl --location --request POST 'https://api.ipums.org/extracts?collection=cps&version=2' \
--header "Authorization: $IPUMS_API_KEY" \
--header 'Content-Type: application/json' \
--data-raw '{
  "description": "Example hierarchical extract",
  "dataStructure": {
    "hierarchical": {}
  },
  "dataFormat": "fixed_width",
  "samples": {
    "cps2018_03s": {},
    "cps2019_03s": {}
  },
  "variables": {
    "AGE": {},
    "SEX": {},
    "RACE": {},
    "STATEFIP": {}
  }
}'

# A successful request will return a response that includes an extract number in the number attribute:

{
  "number": 30,
  "status": "queued",
  "email": "MY_IPUMS_ACCOUNT@EMAIL.ADDRESS",
  "downloadLinks": {},
  "extractDefinition": {
    "version": 2,
    "dataStructure": {
      "hierarchical": {}
    },
    "dataFormat": "fixed_width",
    "caseSelectWho": "individuals",
    "description": "Example hierarchical extract",
    "samples": {
      "cps2018_03s": {},
      "cps2019_03s": {}
    },
    "variables": {
      "RECTYPE": {
        "preselected": true
      },
      "YEAR": {
        "preselected": true
      },
      "SERIAL": {
        "preselected": true
      },
      "MONTH": {
        "preselected": true
      },
      "CPSID": {
        "preselected": true
      },
      "ASECFLAG": {
        "preselected": true
      },
      "ASECWTH": {
        "preselected": true
      },
      "STATEFIP": {},
      "PERNUM": {
        "preselected": true
      },
      "CPSIDP": {
        "preselected": true
      },
      "CPSIDV": {
        "preselected": true
      },
      "ASECWT": {
        "preselected": true
      },
      "AGE": {},
      "SEX": {},
      "RACE": {}
    },
    "collection": "cps"
  }
}

Using Time Use Variables and Selecting Sample Members

For the Time Use data collections (ATUS, AHTUS, MTUS) you can reference previously-defined time use variables (the API doesn’t yet support the creation of new time use variables). This works for both system time use variables and user-defined time use variables.

IPUMS ATUS also allows users to select sample members to include in their extract - not just Respondents but also Household Members and/or Non-Respondents. Additionally, IPUMS ATUS supports the inclusion of Eldercare records in hierarchical data extracts. To include Eldercare records in your IPUMS ATUS extract, include at least one Eldercare variable in the variables section of your extract request, and be sure to specify a “hierarchical” dataStructure.

The following example illustrates the ATUS-specific features (Sample members selection and Eldercare variable inclusion, demonstrated in example below by the Eldercare variable “ECAGE”) as well as features common to all time use data collections (support for system and user-defined time use variable selection).

# construct the JSON payload manually and submit it
curl --location --request POST 'https://api.ipums.org/extracts?collection=atus&version=2' \
--header "Authorization: $IPUMS_API_KEY" \
--header 'Content-Type: application/json' \
--data-raw '{
  "description": "TUV Example",
  "dataFormat": "fixed_width",
  "dataStructure": {
    "hierarchical": {}
  },
  "samples": {
    "at2017": {}
  },
  "timeUseVariables": {
    "BLS_PCARE": {},
    "my_tuv_1": {
      "owner": "MY_IPUMS_ACCOUNT@EMAIL.ADDRESS"
    }
  },
  "variables": {
    "ECAGE": {}
  },  
  "sampleMembers": {
    "includeNonRespondents": true,
    "includeHouseholdMembers": true
  },
  "collection": "atus",
  "version": 2
}'

Note that for your own user-defined variables, you also need to include the owner parameter, since multiple users could use the same name for their time use variables. Note: the system will not allow you to request another owner’s time user variable at this time - the owner requirement has been put in place to allow for the potential of future time user variable sharing functionality. For system time use variables such as BLS_PCARE the owner parameter is not necessary.

A successful request will return a response that includes an extract number in the number attribute:

{
  "number": 3,
  "status": "queued",
  "email": "MY_IPUMS_ACCOUNT@EMAIL.ADDRESS",
  "downloadLinks": {},
  "extractDefinition": {
    "version": 2,
    "dataStructure": {
      "hierarchical": {}
    },
    "dataFormat": "fixed_width",
    "caseSelectWho": "individuals",
    "description": "TUV Example",
    "samples": {
      "at2017": {}
    },
    "variables": {
      "YEAR": {
        "preselected": true
      },
      "CASEID": {
        "preselected": true
      },
      "SERIAL": {
        "preselected": true
      },
      "PERNUM": {
        "preselected": true
      },
      "LINENO": {
        "preselected": true
      },
      "WT06": {
        "preselected": true
      },
      "ECAGE": {}
    },
    "timeUseVariables": {
      "BLS_PCARE": {},
      "my_tuv_1": {
        "owner": "MY_IPUMS_ACCOUNT@EMAIL.ADDRESS"
      }
    },
    "sampleMembers": {
      "includeNonRespondents": true,
      "includeHouseholdMembers": true
    },
    "collection": "atus"
  }
}

Checking a Request’s Status

After submitting your extract request, you can use the API to check status using the extract’s number.

curl --request GET 'https://api.ipums.org/extracts/1?collection=cps&version=2' --header 'Content-Type: application/json' --header "Authorization: $IPUMS_API_KEY"

# A successful request will provide a response object like below. The exact fields may vary depending on how far along the extract is in processing.
# You will get a status such as `queued`, `started`, `produced` `canceled`, `failed` or `completed` in the status field. 

{
  "number": 1,
  "status": "completed",
  "email": "MY_IPUMS_ACCOUNT@EMAIL.ADDRESS",
  "downloadLinks": {
    "rCommandFile": {
      "url": "https://api.ipums.org/downloads/cps/api/v1/extracts/590142/cps_00001.R",
      "bytes": 406,
      "sha256": "ae5fc796b17d9c8bd95d6fd7ade5e4b27199f3e13e54c917102f96201a1d6998"
    },
    "basicCodebook": {
      "url": "https://api.ipums.org/downloads/cps/api/v1/extracts/590142/cps_00001.cbk",
      "bytes": 55343,
      "sha256": "3d4c1442da57d5ee03dadf963a4eaf1afceb62bf648f67cfb926da661df985af"
    },
    "data": {
      "url": "https://api.ipums.org/downloads/cps/api/v1/extracts/590142/cps_00001.dat.gz",
      "bytes": 5878234,
      "sha256": "de4914246b7066c6991119fe2080153c21cfda4b8533e4e544db2f44d54ceebf"
    },
    "stataCommandFile": {
      "url": "https://api.ipums.org/downloads/cps/api/v1/extracts/590142/cps_00001.do",
      "bytes": 77351,
      "sha256": "da4470612f86656d6683ddcb61c9c756a273a4fa97c6e0e5c552d7e8115308e3"
    },
    "sasCommandFile": {
      "url": "https://api.ipums.org/downloads/cps/api/v1/extracts/590142/cps_00001.sas",
      "bytes": 36625,
      "sha256": "d7979bb51665d71187dbec4cc3a28c1c51ba7c4dd0d3706da7c5e7d2f3db5ccb"
    },
    "spssCommandFile": {
      "url": "https://api.ipums.org/downloads/cps/api/v1/extracts/590142/cps_00001.sps",
      "bytes": 38535,
      "sha256": "fa3e8e72ff3df4542f38ef65d530c9ae532f7815d85a7d50810042121d65b11e"
    },
    "ddiCodebook": {
      "url": "https://api.ipums.org/downloads/cps/api/v1/extracts/590142/cps_00001.xml",
      "bytes": 148369,
      "sha256": "73ee163a565498c3187e8f8da5316f07c9a75cbd534dd2b794c2f8590784b5ae"
    }
  },
  "extractDefinition": {
    "version": 2,
    "dataStructure": {
      "rectangular": {
        "on": "P"
      }
    },
    "dataFormat": "fixed_width",
    "caseSelectWho": "individuals",
    "description": "Example extract",
    "samples": {
      "cps2018_03s": {},
      "cps2019_03s": {}
    },
    "variables": {
      "YEAR": {
        "preselected": true
      },
      "SERIAL": {
        "preselected": true
      },
      "MONTH": {
        "preselected": true
      },
      "CPSID": {
        "preselected": true
      },
      "ASECFLAG": {
        "preselected": true
      },
      "ASECWTH": {
        "preselected": true
      },
      "STATEFIP": {},
      "PERNUM": {
        "preselected": true
      },
      "CPSIDP": {
        "preselected": true
      },
      "CPSIDV": {
        "preselected": true
      },
      "ASECWT": {
        "preselected": true
      },
      "AGE": {},
      "SEX": {},
      "RACE": {},
      "BPL": {
        "attachedCharacteristics": [
          "head",
          "mother",
          "mother2",
          "father",
          "father2",
          "spouse"
        ]
      },
      "EDUC": {
        "dataQualityFlags": true
      }
    },
    "collection": "cps"
  }
}

Retrieving Your Extract

To retrieve a completed extract, we will once again do so with the API using the extract’s number.

# download the data file using link that came back in extract request status object once completed
curl -H "Authorization: $IPUMS_API_KEY" https://api.ipums.org/downloads/cps/api/v1/extracts/1234567/cps_00001.dat.gz > my_ipums_cps_extract_1_dat.gz
# repeat for the other files e.g. codebook etc...

Now you are ready for further processing and analysis as you desire.

Get a Listing of Recent Extract Requests

You may also find it useful to get a historical listing of your extract requests.

curl -X GET \
  https://api.ipums.org/extracts?collection=cps&version=2 \
  -H 'Content-Type: application/json' \
  -H "Authorization: $IPUMS_API_KEY"

# If you omit an extract number in your API call, by default this will return the 10 most recent extract requests. To adjust the amount returned, you may optionally specify a `?limit=##` parameter to get the ## most recent extracts instead.