Documentation > Workflows & Code > Creating Extracts > Household-Person Microdata Collections

Create Extracts for Household-Person Microdata Collections

Below we provide examples in curl showing how to work with the IPUMS API to create and manage data extracts for the household-person microdata collections supported by the API (IPUMS USA, CPS and International). Our examples are for IPUMS CPS but they will function the same for USA or International - simply update the collection= string in the API URL and swap out variable and sample names from those data collections instead.

If you are an R or Python user, please note that we also provide language-native client libraries to make it easier to work with the IPUMS API using those languages.

Get your key from your IPUMS user account management page at https://account.ipums.org/api_keys.

Load Libraries and Set Key

# set the IPUMS_API_KEY environment variable using bash shell
export IPUMS_API_KEY=YOUR_API_KEY_HERE

Submit a Data Extract Request

To submit a data extract request, you will construct a JSON payload manually (if you are not using one of the aforementioned R or Python client libraries). Once you have your request formed, you will then submit it to the API.

The names to use for samples and variables in the data extract request can be discovered on our website. See Explore IPUMS Household-Person Microdata Collection Metadata for more information.

The following code uses the curl program in a bash command line environment to obtain a subset of variables from the 2018 and 2019 CPS ASEC data.
# construct the JSON payload manually and submit it 
curl --location --request POST 'https://api.ipums.org/extracts?collection=cps&version=2' \
--header "Authorization: $IPUMS_API_KEY" \
--header 'Content-Type: application/json' \
--data-raw '{
    "description": "Example extract",
    "dataStructure": { 
        "rectangular": {
            "on": "P"
        }
    },
    "dataFormat": "fixed_width",
    "samples": {
      "cps2018_03s": {},
      "cps2019_03s": {}
    },
    "variables":{
      "AGE": {},
      "SEX": {},
      "RACE": {},
      "STATEFIP": {}
    }
}'

# A successful request will return a response that includes an extract number in the number attribute:

{
    "number": 1,
    "status": "queued",
    "downloadLinks": {},
    "extractDefinition": {
        "version": 2,
        "dataStructure": {
            "rectangular": {
                "on": "P"
            }
        },
        "dataFormat": "fixed_width",
        "caseSelectWho": "individuals",
        "description": "Example extract",
        "samples": {
            "cps2018_03s": {},
            "cps2019_03s": {}
        },
        "variables": {
            "YEAR": {
                "preselected": true
            },
            "SERIAL": {
                "preselected": true
            },
            "MONTH": {
                "preselected": true
            },
            "CPSID": {
                "preselected": true
            },
            "ASECFLAG": {
                "preselected": true
            },
            "ASECWTH": {
                "preselected": true
            },
            "STATEFIP": {},
            "PERNUM": {
                "preselected": true
            },
            "CPSIDP": {
                "preselected": true
            },
            "CPSIDV": {
                "preselected": true
            },
            "ASECWT": {
                "preselected": true
            },
            "AGE": {},
            "SEX": {},
            "RACE": {}
        },
        "collection": "cps"
    }
}

You can also submit hierarchical extracts.

# construct the JSON payload manually and submit it
curl --location --request POST 'https://api.ipums.org/extracts?collection=cps&version=2' \
--header "Authorization: $IPUMS_API_KEY" \
--header 'Content-Type: application/json' \
--data-raw '{
    "description": "Example hierarchical extract",
    "dataStructure": { 
        "hierarchical": {}
    },
    "dataFormat": "fixed_width",
    "samples": {
      "cps2018_03s": {},
      "cps2019_03s": {}
    },
    "variables":{
      "AGE": {},
      "SEX": {},
      "RACE": {},
      "STATEFIP": {}
    }
}'

# A successful request will return a response that includes an extract number in the number attribute:

{
    "number": 2,
    "status": "queued",
    "downloadLinks": {},
    "extractDefinition": {
        "version": 2,
        "dataStructure": {
            "hierarchical": {}
        },
        "dataFormat": "fixed_width",
        "caseSelectWho": "individuals",
        "description": "Example hierarchical extract",
        "samples": {
            "cps2018_03s": {},
            "cps2019_03s": {}
        },
        "variables": {
            "RECTYPE": {},
            "YEAR": {
                "preselected": true
            },
            "SERIAL": {
                "preselected": true
            },
            "MONTH": {
                "preselected": true
            },
            "CPSID": {
                "preselected": true
            },
            "ASECFLAG": {
                "preselected": true
            },
            "ASECWTH": {
                "preselected": true
            },
            "STATEFIP": {},
            "PERNUM": {
                "preselected": true
            },
            "CPSIDP": {
                "preselected": true
            },
            "CPSIDV": {
                "preselected": true
            },
            "ASECWT": {
                "preselected": true
            },
            "AGE": {},
            "SEX": {},
            "RACE": {}
        },
        "collection": "cps"
    }
}

Checking a Request’s Status

After submitting your extract request, you can use the API to check status using the extract’s number.

curl --request GET 'https://api.ipums.org/extracts/1?collection=cps&version=2' --header 'Content-Type: application/json' --header "Authorization: $IPUMS_API_KEY"

# A successful request will provide a response object like below. The exact fields may vary depending on how far along the extract is in processing.
# You will get a status such as `queued`, `started`, `produced` `canceled`, `failed` or `completed` in the status field. 

{
    "number": 1,
    "status": "completed",
    "downloadLinks": {
        "basicCodebook": {
            "url": "https://api.ipums.org/downloads/cps/api/v1/extracts/1234567/cps_00001.cbk",
            "bytes": 8492,
            "sha256": "37ce64df8300c73736e7fcfd6c4afb9faaddceddfe3a73bcaa435984ce3c2765"
        },
        "stataCommandFile": {
            "url": "https://api.ipums.org/downloads/cps/api/v1/extracts/1234567/cps_00001.do",
            "bytes": 11741,
            "sha256": "fbad44930e54c3889a3e2e1eb3d04c48ed34743ceb70fcdda66168183ff1670b"
        },
        "data": {
            "url": "https://api.ipums.org/downloads/cps/api/v1/extracts/1234567/cps_00001.dat.gz",
            "bytes": 4577233,
            "sha256": "68851b34fa1841a145403b033786f1bb125fbbc8f85297f776075df187e3a41f"
        },
        "rCommandFile": {
            "url": "https://api.ipums.org/downloads/cps/api/v1/extracts/1234567/cps_00001.R",
            "bytes": 406,
            "sha256": "3009d74bdadd0fadab6b18e9deafc7833a8ef6e8117e8ab3b68008f2f7c64296"
        },
        "spssCommandFile": {
            "url": "https://api.ipums.org/downloads/cps/api/v1/extracts/1234567/cps_00001.sps",
            "bytes": 5945,
            "sha256": "a29ff3ebc41a9bdc3ac65b71ceeef8179b12135607950aeee93ca362f7aa68b7"
        },
        "ddiCodebook": {
            "url": "https://api.ipums.org/downloads/cps/api/v1/extracts/1234567/cps_00001.xml",
            "bytes": 44616,
            "sha256": "fea25a5b6af215142fa55cf3fd9ec2784532643021ebecac4d56e0d17ba1e935"
        },
        "sasCommandFile": {
            "url": "https://api.ipums.org/downloads/cps/api/v1/extracts/1234567/cps_00001.sas",
            "bytes": 5990,
            "sha256": "45f571f15fb17d9dd326db094e26f9ecc335b87629b00d4813cbb47fdc855a64"
        }
    },
    "extractDefinition": {
        "version": 2,
        "dataStructure": {
            "rectangular": {
                "on": "P"
            }
        },
        "dataFormat": "fixed_width",
        "caseSelectWho": "individuals",
        "description": "Example extract",
        "samples": {
            "cps2018_03s": {},
            "cps2019_03s": {}
        },
        "variables": {
            "YEAR": {
                "preselected": true
            },
            "SERIAL": {
                "preselected": true
            },
            "MONTH": {
                "preselected": true
            },
            "CPSID": {
                "preselected": true
            },
            "ASECFLAG": {
                "preselected": true
            },
            "ASECWTH": {
                "preselected": true
            },
            "STATEFIP": {},
            "PERNUM": {
                "preselected": true
            },
            "CPSIDP": {
                "preselected": true
            },
            "CPSIDV": {
                "preselected": true
            },
            "ASECWT": {
                "preselected": true
            },
            "AGE": {},
            "SEX": {},
            "RACE": {}
        },
        "collection": "cps"
    }
}

Retrieving Your Extract

To retrieve a completed extract, we will once again do so with the API using the extract’s number.

# download the data file using link that came back in extract request status object once completed
curl -H "Authorization: $IPUMS_API_KEY" https://api.ipums.org/downloads/cps/api/v1/extracts/1234567/cps_00001.dat.gz > my_ipums_cps_extract_1_dat.gz
# repeat for the other files e.g. codebook etc...

Now you are ready for further processing and analysis as you desire.

Get a Listing of Recent Extract Requests

You may also find it useful to get a historical listing of your extract requests.

curl -X GET \
  https://api.ipums.org/extracts?collection=cps&version=2 \
  -H 'Content-Type: application/json' \
  -H "Authorization: $IPUMS_API_KEY"

# If you omit an extract number in your API call, by default this will return the 10 most recent extract requests. To adjust the amount returned, you may optionally specify a `?limit=##` parameter to get the ## most recent extracts instead.