1. Getting data and data processing

For our analysis, we need to collect data on the following variables.

Dependent variables: 1. GDP per capita 2. Life expectancy 3. Median household income 4. Education levels (we use percentage of population with bachelor’s degree as reference)

The dependent variables were selected because they are key measures related to the Human Development Index.

Independent variables: 1. Unemployment rate 2. Labour force participation rate 3. Labour Productivity (private non-farm) 4. Population 5. Poverty rate 6. Net exports by state

I also use the dependent variables as independent variables in the analysis. For example, when modelling the relation ship of GDP per capita against socioeconomic variables, life expectancy, median household income, and education levels become independent variables. This would enable us to explore which variables is more important in predicting the dependent variable.

1. Getting the data with API

Code

!pip install cenpy
import cenpy
import numpy as np
np.random.seed(42)

1.1 ACS 5-Year Estimate by US States

From cenpy we can access American Community Survey 5-Year data to obtain/derive the following socioeconomic variables: 1. Median Household Income 2. Education Levels 3. Unemployment rate 4. Labour force participation rate 5. Population 6. Poverty rate

Code

# Finding data set
available = cenpy.explorer.available()
available.head(n=5)

# We use data from ACS 5 Year, Monthly Export and Import

	c_isTimeseries	temporal	spatial	c_isAggregate	publisher	references	programCode	modified	license	keyword	...	description	bureauCode	accessLevel	title	c_isAvailable	c_isCube	c_isMicrodata	c_documentationLink	c_dataset	vintage
ABSCB2017	NaN	2017/2017	United States	True	U.S. Census Bureau	https://www.census.gov/developers/	006:007	2020-04-30 00:00:00.0	https://creativecommons.org/publicdomain/zero/...	(census,)	...	The Annual Business Survey (ABS) provides info...	006:07	public	Annual Business Survey: Characteristics of Bus...	True	NaN	NaN	https://www.census.gov/developer/	(abscb,)	2017.0
ABSCB2018	NaN	2018/2018	United States	True	U.S. Census Bureau	https://www.census.gov/developers/	006:007	2020-10-26 00:00:00.0	https://creativecommons.org/publicdomain/zero/...	(census,)	...	The Annual Business Survey (ABS) provides info...	006:07	public	Annual Business Survey: Characteristics of Bus...	True	NaN	NaN	https://www.census.gov/developer/	(abscb,)	2018.0
ABSCB2019	NaN	2019/2019	US	True	U.S. Census Bureau	https://www.census.gov/developers/	006:007	2021-08-17 00:00:00.0	https://creativecommons.org/publicdomain/zero/...	(census,)	...	The Annual Business Survey (ABS) provides info...	006:07	public	2019 Annual Business Survey: Characteristics o...	True	NaN	NaN	https://www.census.gov/developer/	(abscb,)	2019.0
ABSCB2020	NaN	2020/2020	US	True	U.S. Census Bureau	https://www.census.gov/developers/	006:007	2022-08-03 00:00:00.0	https://creativecommons.org/publicdomain/zero/...	(census,)	...	The Annual Business Survey (ABS) provides info...	006:07	public	2020 Annual Business Survey: Characteristics o...	True	NaN	NaN	https://www.census.gov/developer/	(abscb,)	2020.0
ABSCB2021	NaN	2021/2021	United States	True	U.S. Census Bureau	https://www.census.gov/developers/	006:007	2023-07-24 10:30:52.0	https://creativecommons.org/publicdomain/zero/...	(census,)	...	The Annual Business Survey (ABS) provides info...	006:07	public	2021 Annual Business Survey: Characteristics o...	True	NaN	NaN	https://www.census.gov/developer/	(abscb,)	2021.0

5 rows × 23 columns

We identify the dataset we want to use from cenpy. In this case it is ACSDT5Y.

Code

acs = available.filter(regex="^ACS", axis=0)
available.filter(regex="^ACSDT5Y", axis=0).head(n=5)

	c_isTimeseries	temporal	spatial	c_isAggregate	publisher	references	programCode	modified	license	keyword	...	description	bureauCode	accessLevel	title	c_isAvailable	c_isCube	c_isMicrodata	c_documentationLink	c_dataset	vintage
ACSDT5Y2009	NaN	NaN	NaN	True	U.S. Census Bureau	https://www.census.gov/developers/	006:004	2019-08-27 13:11:18.0	https://creativecommons.org/publicdomain/zero/...	(census,)	...	The American Community Survey (ACS) is an ongo...	006:07	public	American Community Survey: 5-Year Estimates: D...	True	True	NaN	https://www.census.gov/developer/	(acs, acs5)	2009.0
ACSDT5Y2010	NaN	NaN	United States	True	U.S. Census Bureau	https://www.census.gov/developers/	006:004	2018-07-04 00:00:00.0	https://creativecommons.org/publicdomain/zero/...	(census,)	...	The American Community Survey (ACS) is an ongo...	006:07	public	ACS 5-Year Detailed Tables	True	True	NaN	https://www.census.gov/developer/	(acs, acs5)	2010.0
ACSDT5Y2011	NaN	NaN	NaN	True	U.S. Census Bureau	https://www.census.gov/developers/	006:004	2018-07-04 00:00:00.0	https://creativecommons.org/publicdomain/zero/...	(census,)	...	The American Community Survey (ACS) is an ongo...	006:07	public	ACS 5-Year Detailed Tables	True	True	NaN	https://www.census.gov/developer/	(acs, acs5)	2011.0
ACSDT5Y2012	NaN	NaN	NaN	True	U.S. Census Bureau	https://www.census.gov/developers/	006:004	2018-07-04 00:00:00.0	https://creativecommons.org/publicdomain/zero/...	(census,)	...	The American Community Survey (ACS) is an ongo...	006:07	public	ACS 5-Year Detailed Tables	True	True	NaN	https://www.census.gov/developer/	(acs, acs5)	2012.0
ACSDT5Y2013	NaN	NaN	NaN	True	U.S. Census Bureau	https://www.census.gov/developers/	006:004	2018-07-04 00:00:00.0	https://creativecommons.org/publicdomain/zero/...	(census,)	...	The American Community Survey (ACS) is an ongo...	006:07	public	ACS 5-Year Detailed Tables	True	True	NaN	https://www.census.gov/developer/	(acs, acs5)	2013.0

5 rows × 23 columns

Code

cenpy.explorer.explain("ACSDT5Y2023")
acs = cenpy.remote.APIConnection('ACSDT5Y2023')

Code

acs.varslike(
    pattern="B19019_001E",
    by="attributes",
).sort_index()

	label	concept	predicateType	group	limit	predicateOnly	hasGeoCollectionSupport	attributes	required
B19019_001E	Estimate!!Total:	Median Household Income in the Past 12 Months ...	int	B19019	0	NaN	NaN	B19019_001EA,B19019_001M,B19019_001MA	NaN

We then extract out the following variables. Since the variables are in object format, we will convert them to float. Rename the variables for clarity.

Code

acs_variables = [
    "NAME",
    "GEO_ID",
    "B19019_001E", # Total Median Household Income
    "B06009_001E", # Educational Attainment- Total
    "B06009_002E", # Educational Attainment- Less than high school graduate
    "B06009_003E", # Educational Attainment- High school graduate (equivalent)
    "B06009_004E", # Educational Attainment- Some college or associate's degree
    "B06009_005E", # Educational Attainment- Bachelor's degree or higher
    "B01003_001E", # Total Population
    "B23025_001E", # Total Population above 16
    "B23025_002E", # Labor Force- Total
    "B23025_005E", # Labour Force- Unemployed
    "B17020_001E", # Population for whom poverty is determined
    "B17020_002E", # Population below poverty level
]

Code

# Filter out the data for state
USA_acs_data = acs.query(
    cols=acs_variables,
    geo_unit="state:*",
)

USA_acs_data.head(n=5)

	NAME	GEO_ID	B19019_001E	B06009_001E	B06009_002E	B06009_003E	B06009_004E	B06009_005E	B01003_001E	B23025_001E	B23025_002E	B23025_005E	B17020_001E	B17020_002E	state
0	Alabama	0400000US01	62027	3448302	409381	1046374	1035466	584986	5054253	4056609	2358667	112849	4913932	768185	01
1	Alaska	0400000US02	89336	487903	31854	141206	162411	95793	733971	574021	380935	20644	716703	72978	02
2	Arizona	0400000US04	76872	4990633	544528	1171693	1646950	999535	7268175	5862117	3547314	182184	7109159	907125	04
3	Arkansas	0400000US05	58773	2043779	233857	702778	593923	324625	3032651	2409758	1405609	71269	2944742	471783	05
4	California	0400000US06	96334	26941198	4149146	5496195	7461496	6035609	39242785	31545603	20144078	1282259	38529452	4610600	06

Code

for variable in acs_variables:
    # Convert all variables EXCEPT for NAME
    if variable not in ("NAME", "GEO_ID"):
        USA_acs_data[variable] = USA_acs_data[variable].astype(float)

Code

USA_acs_data = USA_acs_data.rename(
    columns={
        "B19019_001E": "MedHHInc", # Total Median Household Income
        "B06009_001E": "EducTotal", # Educational Attainment- Total
        "B06009_002E": "EducBelowHighSch", # Educational Attainment- Less than high school graduate
        "B06009_003E": "EducHighSch", # Educational Attainment- High school graduate (equivalent)
        "B06009_004E": "EducAssoc", # Educational Attainment- Some college or associate's degree
        "B06009_005E": "EducBach", # Educational Attainment- Bachelor's degree or higher
        "B01003_001E": "TotalPop", # Total Population
        "B23025_001E": "TotalPop16", # Total Population above 16
        "B23025_002E": "LabForTotal", # Labor Force- Total
        "B23025_005E": "Unemployed", # Labour Force- Unemployed
        "B17020_001E": "PopPovertyDetermined", # Population for whom poverty is determined
        "B17020_002E": "PovertyPop", # Population below poverty level
    }
)

Feature engineering:

We apply feature engineering to derive the following socioeconomic variables: 1) education levels, 2) poverty rate, 3) unemployment rate, and 4) labour force participation rate.

Code

USA_acs_data['PctBach'] = USA_acs_data['EducBach']/USA_acs_data['EducTotal']
USA_acs_data['PovertyRate'] = USA_acs_data['PovertyPop']/USA_acs_data['PopPovertyDetermined']
USA_acs_data['UnemploymentRate'] = USA_acs_data['Unemployed']/USA_acs_data['LabForTotal']
USA_acs_data['LabForParticipationRate'] = USA_acs_data['LabForTotal']/USA_acs_data['TotalPop16']

#Remove Puerto Rico
USA_acs_data = USA_acs_data[USA_acs_data['NAME'] != 'Puerto Rico']

1.2 International Trade: Net Export

Exports:

From cenpy we can also access USA trade data to calculate the net exports by state.

Code

cenpy.explorer.explain("ITMONTHLYEXPORTSSTATENAICS")
US_export = cenpy.remote.APIConnection('ITMONTHLYEXPORTSSTATENAICS')

The code below shows the list of variables in the export dataset and the geographic display level.

Code

US_export.variables

	label	concept	predicateType	group	predicateOnly	hasGeoCollectionSupport	required	datetime	attributes	values
for	Census API FIPS 'for' clause	Census API Geography Specification	fips-for	N/A	True	NaN	NaN	NaN	NaN	NaN
in	Census API FIPS 'in' clause	Census API Geography Specification	fips-in	N/A	True	NaN	NaN	NaN	NaN	NaN
ucgid	Uniform Census Geography Identifier clause	Census API Geography Specification	ucgid	N/A	True	True	NaN	NaN	NaN	NaN
time	ISO-8601 Date/Time value	Census API Date/Time Specification	datetime	N/A	True	NaN	true	{'year': True, 'month': True}	NaN	NaN
CNT_WGT_YR	Year-to-Date Containerized Vessel Shipping Weight	Monthly Exports by State of Origin and NAICS code	int	IT00EXPORTSTATENAICS	NaN	NaN	NaN	NaN	NaN	NaN
MONTH	Month	NaN	string	N/A	NaN	NaN	NaN	NaN	NaN	NaN
SUMMARY_LVL	Detail or Country Grouping indicator	NaN	string	N/A	NaN	NaN	NaN	NaN	NaN	NaN
COMM_LVL	4-character aggregation levels for commodity code	NaN	string	N/A	NaN	NaN	NaN	NaN	COMM_LVL_LABEL	NaN
CNT_VAL_MO	Containerized Vessel Value	Monthly Exports by State of Origin and NAICS code	int	IT00EXPORTSTATENAICS	NaN	NaN	NaN	NaN	NaN	NaN
AIR_WGT_MO	Air Shipping Weight	Monthly Exports by State of Origin and NAICS code	int	IT00EXPORTSTATENAICS	NaN	NaN	NaN	NaN	NaN	NaN
VES_WGT_MO	Vessel Shipping Weight	Monthly Exports by State of Origin and NAICS code	int	IT00EXPORTSTATENAICS	NaN	NaN	NaN	NaN	NaN	NaN
USITC	USITC Standard Countries and Areas	NaN	NaN	N/A	NaN	NaN	NaN	NaN	NaN	NaN
NAICS	NAICS-based International Trade code	Monthly Exports by State of Origin and NAICS code	string	IT00EXPORTSTATENAICS	NaN	NaN	default displayed	NaN	NAICS_LABEL	{'item': {'-': 'Total For All North American I...
CNT_VAL_YR	Year-to-Date Containerized Vessel Value	Monthly Exports by State of Origin and NAICS code	int	IT00EXPORTSTATENAICS	NaN	NaN	NaN	NaN	NaN	NaN
GEOCOMP	GEO_ID Component	NaN	string	N/A	NaN	NaN	NaN	NaN	NaN	NaN
NAICS_LDESC	150-character NAICS-based International Trade ...	NaN	string	N/A	NaN	NaN	NaN	NaN	NaN	NaN
YEAR	Year	NaN	string	N/A	NaN	NaN	NaN	NaN	NaN	NaN
SUMMARY_LVL2	Variables being summarized	NaN	string	N/A	NaN	NaN	NaN	NaN	NaN	NaN
AIR_WGT_YR	Year-to-Date Air Shipping Weight	Monthly Exports by State of Origin and NAICS code	int	IT00EXPORTSTATENAICS	NaN	NaN	NaN	NaN	NaN	NaN
CNT_WGT_MO	Containerized Vessel Shipping Weight	Monthly Exports by State of Origin and NAICS code	int	IT00EXPORTSTATENAICS	NaN	NaN	NaN	NaN	NaN	NaN
VES_WGT_YR	Year-to-Date Vessel Shipping Weight	Monthly Exports by State of Origin and NAICS code	int	IT00EXPORTSTATENAICS	NaN	NaN	NaN	NaN	NaN	NaN
USITCHISTORY	USITC Standard Historical Countries and Areas	NaN	NaN	N/A	NaN	NaN	NaN	NaN	NaN	NaN
WORLD	World	NaN	NaN	N/A	NaN	NaN	NaN	NaN	NaN	NaN
USITCREG	USITC Standard International Regions	NaN	NaN	N/A	NaN	NaN	NaN	NaN	NaN	NaN
ALL_VAL_MO	Total Value	Monthly Exports by State of Origin and NAICS code	int	IT00EXPORTSTATENAICS	NaN	NaN	NaN	NaN	NaN	NaN
AIR_VAL_MO	Air Value	Monthly Exports by State of Origin and NAICS code	int	IT00EXPORTSTATENAICS	NaN	NaN	NaN	NaN	NaN	NaN
ALL_VAL_YR	Year-to-Date Total Value	Monthly Exports by State of Origin and NAICS code	int	IT00EXPORTSTATENAICS	NaN	NaN	NaN	NaN	NaN	NaN
US_STATE	State of Origin of Movement	Monthly Exports by State of Origin and NAICS code	string	IT00EXPORTSTATENAICS	NaN	NaN	NaN	NaN	US_STATE_LABEL	NaN
GEO_ID	Geographic identifier code	Monthly Exports by State of Origin and NAICS code	string	IT00EXPORTSTATENAICS	NaN	NaN	NaN	NaN	NAME	NaN
SUMLEVEL	Summary Level code	NaN	string	N/A	NaN	NaN	NaN	NaN	NaN	NaN
LAST_UPDATE	Date of Last Update	NaN	string	N/A	NaN	NaN	NaN	NaN	NaN	NaN
NAICS_SDESC	50-character NAICS-based International Trade C...	NaN	string	N/A	NaN	NaN	NaN	NaN	NaN	NaN
VES_VAL_MO	Vessel Value	Monthly Exports by State of Origin and NAICS code	int	IT00EXPORTSTATENAICS	NaN	NaN	NaN	NaN	NaN	NaN
CTY_CODE	Country Code	NaN	string	N/A	NaN	NaN	NaN	NaN	CTY_NAME	NaN
VES_VAL_YR	Year-to-Date Vessel Value	Monthly Exports by State of Origin and NAICS code	int	IT00EXPORTSTATENAICS	NaN	NaN	NaN	NaN	NaN	NaN
AIR_VAL_YR	Year-to-Date Air Value	Monthly Exports by State of Origin and NAICS code	int	IT00EXPORTSTATENAICS	NaN	NaN	NaN	NaN	NaN	NaN

Code

US_export.geographies['fips']

	name	geoLevelDisplay	referenceDate
0	world	W01	2022-01-01
1	usitc standard international regions	W04	2022-01-01
2	usitc standard countries and areas	W16	2022-01-01
3	usitc standard historical countries and areas	W20	2022-01-01

We run the code below to extract out the total trade to date per month in 2023. We set geographic display levels to world since we are interested only in aggregate exports.

Code

US_export_data = US_export.query(
    cols=["US_STATE", "GEO_ID","YEAR","MONTH","ALL_VAL_YR"],
    geo_unit="world:*",
    )

for variable in ["YEAR","MONTH","ALL_VAL_YR"]:
    US_export_data[variable] = US_export_data[variable].astype(float)

US_export_data.head(n=5)

	US_STATE	GEO_ID	YEAR	MONTH	ALL_VAL_YR	world
0	-	W0100Y1WO	2013.0	1.0	1.230319e+11	1
1	AK	W0100Y1WO	2013.0	1.0	7.526427e+07	1
2	AL	W0100Y1WO	2013.0	1.0	1.300844e+09	1
3	AR	W0100Y1WO	2013.0	1.0	4.963303e+08	1
4	AZ	W0100Y1WO	2013.0	1.0	1.510832e+09	1

Finally, filter for December to get aggregate exports in 2023

Code

# Filter data for December 2023
US_export_2023 = US_export_data[(US_export_data['YEAR'] == 2023) & (US_export_data['MONTH'] == 12)]

Imports:

The code below shows the list of variables in the import dataset and the geographic display level.

Code

cenpy.explorer.explain("ITMONTHLYIMPORTSSTATENAICS")
US_import = cenpy.remote.APIConnection('ITMONTHLYIMPORTSSTATENAICS')

Code

US_import.variables

	label	concept	predicateType	group	predicateOnly	hasGeoCollectionSupport	required	datetime	attributes	values
for	Census API FIPS 'for' clause	Census API Geography Specification	fips-for	N/A	True	NaN	NaN	NaN	NaN	NaN
in	Census API FIPS 'in' clause	Census API Geography Specification	fips-in	N/A	True	NaN	NaN	NaN	NaN	NaN
ucgid	Uniform Census Geography Identifier clause	Census API Geography Specification	ucgid	N/A	True	True	NaN	NaN	NaN	NaN
time	ISO-8601 Date/Time value	Census API Date/Time Specification	datetime	N/A	True	NaN	true	{'month': True, 'year': True}	NaN	NaN
CNT_WGT_YR	Year-to-Date Containerized Vessel Shipping Weight	Monthly Imports by State and NAICS	int	IT00IMPORTSSTATENAICS	NaN	NaN	NaN	NaN	NaN	NaN
MONTH	Month	NaN	string	N/A	NaN	NaN	NaN	NaN	NaN	NaN
SUMMARY_LVL	Detail or Country Grouping indicator	NaN	string	N/A	NaN	NaN	NaN	NaN	NaN	NaN
COMM_LVL	4-character aggregation levels for commodity code	NaN	string	N/A	NaN	NaN	NaN	NaN	COMM_LVL_LABEL	NaN
CON_VAL_MO	Imports for Consumption, Total Value	Monthly Imports by State and NAICS	int	IT00IMPORTSSTATENAICS	NaN	NaN	NaN	NaN	NaN	NaN
CNT_VAL_MO	Containerized Vessel Value	Monthly Imports by State and NAICS	int	IT00IMPORTSSTATENAICS	NaN	NaN	NaN	NaN	NaN	NaN
VES_WGT_MO	Vessel Shipping Weight	Monthly Imports by State and NAICS	int	IT00IMPORTSSTATENAICS	NaN	NaN	NaN	NaN	NaN	NaN
AIR_WGT_MO	Air Shipping Weight	Monthly Imports by State and NAICS	int	IT00IMPORTSSTATENAICS	NaN	NaN	NaN	NaN	NaN	NaN
USITC	USITC Standard Countries and Areas	NaN	NaN	N/A	NaN	NaN	NaN	NaN	NaN	NaN
NAICS	NAICS-based International Trade Code	Monthly Imports by State and NAICS	string	IT00IMPORTSSTATENAICS	NaN	NaN	default displayed	NaN	NAICS_LABEL	{'item': {'-': 'Total For All North American I...
CON_VAL_YR	Year-to-Date Imports for Consumption, Total Value	Monthly Imports by State and NAICS	int	IT00IMPORTSSTATENAICS	NaN	NaN	NaN	NaN	NaN	NaN
CNT_VAL_YR	Year-to-Date Containerized Vessel Value	Monthly Imports by State and NAICS	int	IT00IMPORTSSTATENAICS	NaN	NaN	NaN	NaN	NaN	NaN
GEOCOMP	GEO_ID Component	NaN	string	N/A	NaN	NaN	NaN	NaN	NaN	NaN
GEN_VAL_YR	Year-to-Date General Imports, Total Value	Monthly Imports by State and NAICS	int	IT00IMPORTSSTATENAICS	NaN	NaN	NaN	NaN	NaN	NaN
NAICS_LDESC	150-character NAICS-based International Trade ...	NaN	string	N/A	NaN	NaN	NaN	NaN	NaN	NaN
YEAR	Year	NaN	string	N/A	NaN	NaN	NaN	NaN	NaN	NaN
SUMMARY_LVL2	Variables being summarized	NaN	string	N/A	NaN	NaN	NaN	NaN	NaN	NaN
AIR_WGT_YR	Year-to-Date Air Shipping Weight	Monthly Imports by State and NAICS	int	IT00IMPORTSSTATENAICS	NaN	NaN	NaN	NaN	NaN	NaN
CNT_WGT_MO	Containerized Vessel Shipping Weight	Monthly Imports by State and NAICS	int	IT00IMPORTSSTATENAICS	NaN	NaN	NaN	NaN	NaN	NaN
VES_WGT_YR	Year-to-Date Vessel Shipping Weight	Monthly Imports by State and NAICS	int	IT00IMPORTSSTATENAICS	NaN	NaN	NaN	NaN	NaN	NaN
USITCHISTORY	USITC Standard Historical Countries and Areas	NaN	NaN	N/A	NaN	NaN	NaN	NaN	NaN	NaN
GEN_VAL_MO	General Imports, Total Value	Monthly Imports by State and NAICS	int	IT00IMPORTSSTATENAICS	NaN	NaN	NaN	NaN	NaN	NaN
WORLD	World	NaN	NaN	N/A	NaN	NaN	NaN	NaN	NaN	NaN
USITCREG	USITC Standard International Regions	NaN	NaN	N/A	NaN	NaN	NaN	NaN	NaN	NaN
AIR_VAL_MO	Air Value	Monthly Imports by State and NAICS	int	IT00IMPORTSSTATENAICS	NaN	NaN	NaN	NaN	NaN	NaN
US_STATE	State of Destination	Monthly Imports by State and NAICS	string	IT00IMPORTSSTATENAICS	NaN	NaN	NaN	NaN	US_STATE_LABEL	NaN
GEO_ID	Geographic identifier code	Monthly Imports by State and NAICS	string	IT00IMPORTSSTATENAICS	NaN	NaN	NaN	NaN	NAME	NaN
SUMLEVEL	Summary Level code	NaN	string	N/A	NaN	NaN	NaN	NaN	NaN	NaN
LAST_UPDATE	Date of Last Update	NaN	string	N/A	NaN	NaN	NaN	NaN	NaN	NaN
NAICS_SDESC	50-character NAICS-based International Trade C...	NaN	string	N/A	NaN	NaN	NaN	NaN	NaN	NaN
VES_VAL_MO	Vessel Value	Monthly Imports by State and NAICS	int	IT00IMPORTSSTATENAICS	NaN	NaN	NaN	NaN	NaN	NaN
CTY_CODE	Country Code	NaN	string	N/A	NaN	NaN	NaN	NaN	CTY_NAME	NaN
VES_VAL_YR	Year-to-Date Vessel Value	Monthly Imports by State and NAICS	int	IT00IMPORTSSTATENAICS	NaN	NaN	NaN	NaN	NaN	NaN
AIR_VAL_YR	Year-to-Date Air Value	Monthly Imports by State and NAICS	int	IT00IMPORTSSTATENAICS	NaN	NaN	NaN	NaN	NaN	NaN

We run the code below to extract out the total trade to date per month in 2023. We set geographic display levels to world since we are interested only in aggregate imports.

Code

US_import_data = US_import.query(
    cols=["US_STATE", "GEO_ID","YEAR","MONTH","GEN_VAL_YR"],
    geo_unit="world:*",
    )

for variable in ["YEAR","MONTH","GEN_VAL_YR"]:
    US_import_data[variable] = US_import_data[variable].astype(float)

US_import_data.head(n=5)

	US_STATE	GEO_ID	YEAR	MONTH	GEN_VAL_YR	world
0	-	W0100Y1WO	2013.0	1.0	1.851686e+11	1
1	AK	W0100Y1WO	2013.0	1.0	7.511713e+07	1
2	AL	W0100Y1WO	2013.0	1.0	1.525358e+09	1
3	AR	W0100Y1WO	2013.0	1.0	6.097465e+08	1
4	AZ	W0100Y1WO	2013.0	1.0	1.782983e+09	1

Finally, filter for December to get aggregate import in 2023

Code

# Filter data for December 2023
US_import_2023 = US_import_data[(US_import_data['YEAR'] == 2023) & (US_import_data['MONTH'] == 12)]

Net Export (Export-Import)

Net export is then calculated by taking Export – Import. Rename the states to include their full name for joining later.

Code

# Join export and import data
US_netexport_2023 = US_export_2023.merge(US_import_2023[['US_STATE', 'GEN_VAL_YR']], on='US_STATE', how='left')

# Net export
US_netexport_2023["netexport"] = US_netexport_2023["ALL_VAL_YR"]-US_netexport_2023["GEN_VAL_YR"]

Code

# Create new column with states name
state_names = {
    'AL': 'Alabama', 'AK': 'Alaska', 'AZ': 'Arizona', 'AR': 'Arkansas', 'CA': 'California',
    'CO': 'Colorado', 'CT': 'Connecticut', 'DE': 'Delaware', 'FL': 'Florida', 'GA': 'Georgia',
    'HI': 'Hawaii', 'ID': 'Idaho', 'IL': 'Illinois', 'IN': 'Indiana', 'IA': 'Iowa',
    'KS': 'Kansas', 'KY': 'Kentucky', 'LA': 'Louisiana', 'ME': 'Maine', 'MD': 'Maryland',
    'MA': 'Massachusetts', 'MI': 'Michigan', 'MN': 'Minnesota', 'MS': 'Mississippi', 'MO': 'Missouri',
    'MT': 'Montana', 'NE': 'Nebraska', 'NV': 'Nevada', 'NH': 'New Hampshire', 'NJ': 'New Jersey',
    'NM': 'New Mexico', 'NY': 'New York', 'NC': 'North Carolina', 'ND': 'North Dakota', 'OH': 'Ohio',
    'OK': 'Oklahoma', 'OR': 'Oregon', 'PA': 'Pennsylvania', 'RI': 'Rhode Island', 'SC': 'South Carolina',
    'SD': 'South Dakota', 'TN': 'Tennessee', 'TX': 'Texas', 'UT': 'Utah', 'VT': 'Vermont',
    'VA': 'Virginia', 'WA': 'Washington', 'WV': 'West Virginia', 'WI': 'Wisconsin', 'WY': 'Wyoming',
    'DC': 'District of Columbia'
}

US_netexport_2023['STATE_NAME'] = US_netexport_2023['US_STATE'].map(state_names)

# Filter states only
US_netexport_2023 = US_netexport_2023.dropna(subset=['STATE_NAME'])

1.3 Bureau of Economic Analysis: Gross Domesitc Product (GDP)

From Bureau of Economic Analysis, we can access real GDP data to calculate real GDP per capita for each state. Real GDP per capital is total real GDP divided by total population, with total population data coming from ACS 5-Year Survey earlier.

First, import the package beaapi.

Code

!pip install beaapi-0.0.2-py3-none-any.whl

import beaapi

Set up your API key. You will need to request one from the BEA website if your do not have and validate it: https://www.bea.gov/resources/for-developers

Code

import config
beakey = config.bea_api_key #API Key

Review the list of dataset and parameters available. For this analysis we will be using ‘Regional’ dataset and SAGDP9N.

Code

# List of data set available
list_of_sets = beaapi.get_data_set_list(beakey)
# List of parameters
list_of_params = beaapi.get_parameter_list(beakey, 'Regional')
# List of parameters values
list_of_param_vals = beaapi.get_parameter_values(beakey, 'Regional', 'LineCode',)
list_of_param_vals.head(n=5)

	Key	Desc
0	1	[CAGDP1] Real Gross Domestic Product (GDP)
1	1	[CAGDP11] Contributions to percent change in r...
2	1	[CAGDP2] Gross Domestic Product (GDP): All ind...
3	1	[CAGDP8] Chain-type quantity indexes for real ...
4	1	[CAGDP9] Real GDP: All industry total

Extract the real GDP for each state using the code below.

Code

bea_tbl = beaapi.get_data(beakey, datasetname='Regional', GeoFips= 'STATE', LineCode= '1', TableName='SAGDP9N', Year='2023')
display(bea_tbl.head(5))

	Code	GeoFips	GeoName	TimePeriod	CL_UNIT	UNIT_MULT	DataValue
0	SAGDP9N-1	00000	United States	2023	Millions of chained 2017 dollars	6	22671096.0
1	SAGDP9N-1	01000	Alabama	2023	Millions of chained 2017 dollars	6	245354.7
2	SAGDP9N-1	02000	Alaska	2023	Millions of chained 2017 dollars	6	54059.7
3	SAGDP9N-1	04000	Arizona	2023	Millions of chained 2017 dollars	6	422399.6
4	SAGDP9N-1	05000	Arkansas	2023	Millions of chained 2017 dollars	6	142860.6

Code

bea_state_tbl = bea_tbl[bea_tbl['GeoName'].isin(state_names.values())]
bea_state_tbl.rename(columns={'DataValue': 'REALGDP'}, inplace=True)

1.4 Centers for Disease Control and Prevention: Life Expectancy by state

The most recent data available for Life Expectancy was for 2021 and this data was published in 2024. We assume for our analysis that 2021 and 2024 life expectancy are similar.

Install and import the packages pandas and Socrata.

Code

!pip install sodapy
import pandas as pd
from sodapy import Socrata

Get the data of life expectancy for each state and both sexes using the code below. This allows us to access the data on https://data.cdc.gov/NCHS/U-S-State-Life-Expectancy-by-Sex-2021/it4f-frdc/about_data

Code

cdc = Socrata("data.cdc.gov", None)
life = cdc.get("it4f-frdc", limit=2000)
life_df = pd.DataFrame.from_records(life)
life_df = life_df[life_df["sex"]=="Total"]
life_df["leb"] = life_df["leb"].astype(float)
life_df.rename(columns={'leb': 'life_expectancy'}, inplace=True)

WARNING:root:Requests made without an app_token will be subject to strict throttling limits.

1.5 Bureau of Labor Statistics: Labor Productivity

I downloaded the csv file of labour productivity from the Bureau of Labor Statistics website https://www.bls.gov/productivity/tables/. The data are all indexed with reference to 2017, with 2017=100 for all states. This means that by comparing index values in 2023, we can see which states have become relatively more productive than others since 2017. The code below gets labor productivity data for 2023.

Code

lab_pdt = pd.read_csv('https://raw.githubusercontent.com/JiaYue-Ong/Python-Final-Project/refs/heads/main/labor-productivity.csv')
lab_pdt.head()

	Sector	Area	Basis	Measure	Units	2007	2008	2009	2010	2011	...	2014	2015	2016	2017	2018	2019	2020	2021	2022	2023
0	Private nonfarm	Alabama	All workers	Labor productivity	Index (2017=100)	89.822	93.420	97.944	100.873	102.157	...	99.612	98.811	98.859	100.0	101.089	102.665	106.346	107.076	106.151	107.992
1	Private nonfarm	Alaska	All workers	Labor productivity	Index (2017=100)	92.779	92.841	106.275	99.772	94.378	...	95.217	97.048	98.582	100.0	95.548	92.722	100.851	99.282	93.051	100.303
2	Private nonfarm	Arizona	All workers	Labor productivity	Index (2017=100)	94.611	94.863	92.812	95.264	98.879	...	98.800	97.511	98.976	100.0	101.491	103.499	108.574	113.423	109.700	109.179
3	Private nonfarm	Arkansas	All workers	Labor productivity	Index (2017=100)	94.132	94.271	96.365	100.418	100.516	...	103.521	102.602	102.102	100.0	100.728	101.374	107.309	107.508	105.726	106.947
4	Private nonfarm	California	All workers	Labor productivity	Index (2017=100)	83.296	85.382	89.426	91.852	91.820	...	93.341	95.603	96.438	100.0	102.159	107.356	117.005	120.095	115.262	118.074

5 rows × 22 columns

Code

lab_pdt_2023 = lab_pdt[lab_pdt['Area'].isin(state_names.values())]
lab_pdt_2023 = lab_pdt_2023[["Area","2023"]]
lab_pdt_2023.rename(columns={'Area': 'State', '2023': 'Labor_Productivity_2023'}, inplace=True)

2. Data Processing

We now join all our dataset together into one dataframe, by the individual states. After joining, we calculate the Real GDP per capita for each state.

Code

# All dataset
USA_acs_data
US_netexport_2023
bea_state_tbl
life_df
lab_pdt_2023

Code

# Join ACS and netexport
df1 = USA_acs_data.merge(US_netexport_2023[['STATE_NAME', 'netexport']], left_on='NAME', right_on='STATE_NAME', how='left').drop(
    columns=["STATE_NAME"]
)

# Join bea_state_tbl
df2 = df1.merge(bea_state_tbl[['GeoName', 'REALGDP']], left_on='NAME', right_on='GeoName', how='left').drop(
    columns=["GeoName"]
)

# Join life_df
df3 = df2.merge(life_df[['area', 'life_expectancy']], left_on='NAME', right_on='area', how='left').drop(
    columns=["area"]
)

# Join lab_pdt_2023
df4 = df3.merge(lab_pdt_2023, left_on='NAME', right_on='State', how='left').drop(
    columns=["State"]
)

# Create Real GDP per capita
df4["REALGDPpercapita"] = (df4["REALGDP"]*10**6)/df4["TotalPop"]

We then obtain a geographic map of USA and join the dataframe to the geographies. This would allow us to conduct spatial analysis

Code

# Get geographies of US States
import pygris
from pygris import states
from pygris.utils import shift_geometry

Code

us = states(cb = True, resolution = "20m", year=2023)
us_rescaled = shift_geometry(us)
us_rescaled.plot()

Code

# Join the data to geography
us_rescaled.head()

	STATEFP	STATENS	GEOIDFQ	GEOID	STUSPS	NAME	ALAND	AWATER	geometry
0	48	01779801	0400000US48	48	TX	Texas	676686238592	18982083586	POLYGON ((-998043.807 -568739.971, -997956.109...
1	06	01779778	0400000US06	06	CA	California	403673296401	20291770234	MULTIPOLYGON (((-2066284.899 -204542.622, -205...
2	21	01779786	0400000US21	21	KY	Kentucky	102266598312	2384223544	POLYGON ((571924.530 -84268.109, 577745.348 -8...
3	13	01705317	0400000US13	13	GA	Georgia	149485311347	4419673221	POLYGON ((939223.082 -230281.864, 951008.202 -...
4	55	01779806	0400000US55	55	WI	Wisconsin	140292627460	29343084365	MULTIPOLYGON (((708320.068 919586.190, 715683....

Code

us_rescaled_final = us_rescaled.merge(
    df4,
    left_on=["GEOID"],
    right_on=["state"],
).drop(
    columns=["state"]
)

# Convert CRS
us_rescaled_final = us_rescaled_final.to_crs("EPSG:4326")

Code

us_rescaled_final.columns

Index(['STATEFP', 'STATENS', 'GEOIDFQ', 'GEOID', 'STUSPS', 'NAME_x', 'LSAD',
       'ALAND', 'AWATER', 'geometry', 'NAME_y', 'GEO_ID', 'MedHHInc',
       'EducTotal', 'EducBelowHighSch', 'EducHighSch', 'EducAssoc', 'EducBach',
       'TotalPop', 'TotalPop16', 'LabForTotal', 'Unemployed',
       'PopPovertyDetermined', 'PovertyPop', 'PctBach', 'PovertyRate',
       'UnemploymentRate', 'LabForParticipationRate', 'netexport', 'REALGDP',
       'life_expectancy', 'Labor_Productivity_2023', 'REALGDPpercapita'],
      dtype='object')