Assignment 6: Predictive Modeling of Housing Prices in Philadelphia

Due date: Wednesday, 12/6 by the end of the day

Lectures 12B and 13A will cover predictive modeling of housing prices in Philadelphia. We’ll extend that analysis in this section by:

Optimizing our hyperparameters during the modeling process using cross-validation and a grid search
Testing the fairness of our model by calculating the intersection of the model error rate and poverty rate across neighborhoods

Part 2: Modeling Philadelphia’s Housing Prices and Algorithmic Fairness

2.1 Load data from the Office of Property Assessment

Use the requests package to query the CARTO API for single-family property assessment data in Philadelphia for properties that had their last sale during 2022.

Sources: - OpenDataPhilly - Metadata

import requests
import pandas as pd
import geopandas as gpd
import numpy as np

carto_url = "https://phl.carto.com/api/v2/sql"
where = "sale_date >= '2022-01-01' and sale_date <= '2022-12-31'"
where = where + " and category_code_description IN ('SINGLE FAMILY', 'Single Family')"

# Create the query
query = f"SELECT * FROM opa_properties_public WHERE {where}"

# Make the request
params = {"q": query, "format": "geojson", "where": where}
response = requests.get(carto_url, params=params)

# Make the GeoDataFrame
sales2022 = gpd.GeoDataFrame.from_features(response.json(), crs="EPSG:4326")

2.2 Load data for census tracts and neighborhoods

Load various Philadelphia-based regions that we will use in our analysis.

Census tracts can be downloaded from: https://opendata.arcgis.com/datasets/8bc0786524a4486bb3cf0f9862ad0fbf_0.geojson
Neighborhoods can be downloaded from: https://raw.githubusercontent.com/azavea/geo-data/master/Neighborhoods_Philadelphia/Neighborhoods_Philadelphia.geojson

tracts = gpd.read_file('2010_Tracts.geojson')

url = 'https://raw.githubusercontent.com/azavea/geo-data/master/Neighborhoods_Philadelphia/Neighborhoods_Philadelphia.geojson'
neighbor = gpd.read_file(url).to_crs('EPSG:4326')

2.3 Spatially join the sales data and neighborhoods/census tracts.

Perform a spatial join, such that each sale has an associated neighborhood and census tract.

Note: After performing the first spatial join, you will need to use the drop() function to remove the index_right column; otherwise an error will be raised on the second spatial join about duplicate columns.

joined = gpd.sjoin(sales2022, tracts, predicate="within", how="left",)
joined = gpd.sjoin(joined.drop('index_right', axis=1), neighbor, predicate = "within", how = "left")

2.4 Train a Random Forest on the sales data

In this step, you should follow the steps outlined in lecture to preprocess and train your model. We’ll extend our analysis to do a hyperparameter grid search to find the best model configuration. As you train your model, follow the following steps:

Preprocessing Requirements - Trim the sales data to those sales with prices between $3,000 and $1 million - Set up a pipeline that includes both numerical columns and categorical columns - Include one-hot encoded variable for the neighborhood of the sale, instead of ZIP code. We don’t want to include multiple location based categories, since they encode much of the same information.

Training requirements - Use a 70/30% training/test split and predict the log of the sales price. - Use GridSearchCV to perform a k-fold cross validation that optimize at least 2 hyperparameters of the RandomForestRegressor - After fitting your model and finding the optimal hyperparameters, you should evaluate the score (R-squared) on the test set (the original 30% sample withheld)

Note: You don’t need to include additional features (such as spatial distance features) or perform any extra feature engineering beyond what is required above to receive full credit. Of course, you are always welcome to experiment!

# model
from sklearn.ensemble import RandomForestRegressor

# pipeline
from sklearn.pipeline import make_pipeline

# model selection
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV

# Preprocessing
from sklearn.preprocessing import StandardScaler, PolynomialFeatures

# trim data to between $3000 and $1 million
valid = (joined['sale_price'] > 3000) & (joined['sale_price'] < 1e6)
joined = joined.loc[valid]

pd.set_option('display.max_columns', None)
joined.head()

	geometry	cartodb_id_left	assessment_date	basements	beginning_point	book_and_page	building_code	building_code_description	category_code	category_code_description	census_tract	central_air	cross_reference	date_exterior_condition	depth	exterior_condition	frontage	fuel	garage_spaces	garage_type	general_construction	geographic_ward	house_extension	house_number	interior_condition	location	mailing_address_1	mailing_address_2	mailing_care_of	mailing_city_state	mailing_street	mailing_zip	market_value	market_value_date	number_of_bathrooms	number_of_bedrooms	number_of_rooms	number_stories	off_street_open	other_building	owner_1	owner_2	parcel_number	parcel_shape	quality_grade	recording_date	registry_number	sale_date	sale_price	separate_utilities	sewer	site_type	state_code	street_code	street_designation	street_direction	street_name	suffix	taxable_building	taxable_land	topography	total_area	total_livable_area	type_heater	unfinished	unit	utility	view_type	year_built	year_built_estimate	zip_code	zoning	pin	building_code_new	building_code_description_new	objectid	OBJECTID	STATEFP10	COUNTYFP10	TRACTCE10	GEOID10	NAME10	NAMELSAD10	MTFCC10	FUNCSTAT10	ALAND10	INTPTLAT10	INTPTLON10	LOGRECNO	index_right	name	listname	mapname	shape_leng	shape_area	cartodb_id_right	created_at	updated_at
0	POINT (-75.18212 39.97558)	1699	2022-05-24T00:00:00Z	None	139'10 15/16" N STILES	54234223	O30	ROW 2 STY MASONRY	1	SINGLE FAMILY	137	None	None	None	36.0	4	13.0	None	0.0	None	A	29	None	1245	4	1245 N NEWKIRK ST	CSC INGEO	None	None	PHILADELPHIA PA	1245 N NEWKIRK ST	19121-4526	131100	None	1.0	2.0	NaN	2.0	1125.0	None	AUTUMN CAPITAL GROUP INC	None	292108701	E	C	2023-10-17T00:00:00Z	010N010260	2022-10-24T00:00:00Z	213000	None	None	None	PA	59640	ST	N	NEWKIRK	None	104880.0	26220.0	F	468.0	650.0	None	None	None	None	I	1920	Y	19121	RSA5	1001388772	22	ROW TYPICAL	401673742	238.0	42	101	013700	42101013700	137	Census Tract 137	G5020	S	589738.0	+39.9772752	-075.1842323	10467	100.0	BREWERYTOWN	Brewerytown	Brewerytown	14790.673736	1.125654e+07	80.0	2013-03-19 17:41:50.508000+00:00	2013-03-19 17:41:50.743000+00:00
2	POINT (-75.16524 40.00137)	2261	2022-05-24T00:00:00Z	D	120' W 21ST ST	54222844	O30	ROW 2 STY MASONRY	1	SINGLE FAMILY	173	N	None	None	79.0	4	20.0	None	0.0	None	F	11	None	2116	3	2116 W CLEARFIELD ST	None	None	None	PHILADELPHIA PA	2116 W CLEARFIELD ST	19132-1517	139700	None	1.0	4.0	NaN	2.0	929.0	None	MILLIONIQUE LLC	None	111112200	E	C	2023-09-13T00:00:00Z	38N5 289	2022-03-03T00:00:00Z	10000	None	None	None	PA	23640	ST	W	CLEARFIELD	None	111760.0	27940.0	F	1583.0	1960.0	H	None	None	None	I	1925	Y	19132	RSA5	1001149040	22	ROW TYPICAL	401675592	241.0	42	101	017300	42101017300	173	Census Tract 173	G5020	S	874586.0	+39.9985707	-075.1609337	10503	12.0	GLENWOOD	Glenwood	Glenwood	12350.882116	9.652808e+06	65.0	2013-03-19 17:41:50.508000+00:00	2013-03-19 17:41:50.743000+00:00
4	POINT (-75.14075 39.97514)	2645	2023-05-11T00:00:00Z	A	NWC BODINE TO NEC	54224335	SR	VACANT LAND RESIDE < ACRE	1	SINGLE FAMILY	156	Y	None	None	61.0	1	21.0	A	1.0	1	B	18	None	249	1	249 W OXFORD ST	SIMPLIFILE LC E-RECORDING	None	None	PHILADELPHIA PA	249 W OXFORD ST # A	19122-3742	500000	None	NaN	3.0	NaN	3.0	NaN	None	None	None	182000012	E	None	2023-09-18T00:00:00Z	012N230250	2022-03-29T00:00:00Z	192000	None	2	None	PA	62120	ST	W	OXFORD	None	400000.0	100000.0	F	1294.0	2199.0	A	None	A	None	I	2021	None	19122	RSA5	1001681249	25	ROW MODERN	401675399	18.0	42	101	015600	42101015600	156	Census Tract 156	G5020	S	429567.0	+39.9790066	-075.1418910	10484	87.0	OLD_KENSINGTON	Kensington, Old	Old Kensington	10913.733421	7.476740e+06	88.0	2013-03-19 17:41:50.508000+00:00	2013-03-19 17:41:50.743000+00:00
6	POINT (-75.13389 40.03928)	2839	2022-05-24T00:00:00Z	H	241' N OF CHEW ST	54230133	R30	ROW B/GAR 2 STY MASONRY	1	SINGLE FAMILY	275	N	None	None	95.0	7	15.0	None	1.0	None	B	61	None	5732	4	5732 N 7TH ST	WALKER MICHAEL	None	None	SICKLERVILLE NJ	44 FARMHOUSE RD	08081	133400	None	1.0	3.0	NaN	2.0	1920.0	None	WALKER MICHAEL	None	612234600	E	C	2023-10-04T00:00:00Z	135N7 61	2022-08-21T00:00:00Z	21000	None	None	None	NJ	87930	ST	N	7TH	None	106720.0	26680.0	F	1425.0	1164.0	H	None	None	None	I	1925	Y	19120	RSA5	1001602509	24	ROW PORCH FRONT	401674376	350.0	42	101	027500	42101027500	275	Census Tract 275	G5020	S	606825.0	+40.0400497	-075.1322707	10588	42.0	OLNEY	Olney	Olney	32197.205271	5.030840e+07	8.0	2013-03-19 17:41:50.508000+00:00	2013-03-19 17:41:50.743000+00:00
7	POINT (-75.14337 40.00957)	2890	2022-05-24T00:00:00Z	D	415' N OF ERIE AVE	54230032	O30	ROW 2 STY MASONRY	1	SINGLE FAMILY	198	N	None	None	45.0	4	16.0	None	0.0	None	A	43	None	3753	4	3753 N DELHI ST	None	None	None	DELRAY BEACH FL	4899 NW 6TH STREET	33445	73800	None	1.0	3.0	NaN	2.0	1683.0	None	RJ SIMPLE SOLUTION LLC	None	432345900	E	C	2023-10-04T00:00:00Z	100N040379	2022-06-13T00:00:00Z	35000	None	None	None	FL	28040	ST	N	DELHI	None	59040.0	14760.0	F	720.0	960.0	H	None	None	None	I	1942	Y	19140	RM1	1001175031	24	ROW PORCH FRONT	401674385	23.0	42	101	019800	42101019800	198	Census Tract 198	G5020	S	541006.0	+40.0107245	-075.1421472	10523	98.0	HUNTING_PARK	Hunting Park	Hunting Park	32920.799360	3.902450e+07	73.0	2013-03-19 17:41:50.508000+00:00	2013-03-19 17:41:50.743000+00:00

#categorical
cat_cols = ['exterior_condition', 'listname']
 
# numeric
num_cols = ['fireplaces', 'garage_spaces', 'number_of_bathrooms', 'number_of_bedrooms', 'number_stories', 'total_area', 'total_livable_area']

features = cat_cols + num_cols + ['sale_price']

joined_trimmed = joined[features].dropna()
joined_trimmed

	exterior_condition	interior_condition	quality_grade	parcel_shape	listname	view_type	year_built	fireplaces	garage_spaces	number_of_bathrooms	number_of_bedrooms	number_stories	total_area	total_livable_area	sale_price
0	4	4	C	E	Brewerytown	I	1920	0.0	0.0	1.0	2.0	2.0	468.0	650.0	213000
2	4	3	C	E	Glenwood	I	1925	0.0	0.0	1.0	4.0	2.0	1583.0	1960.0	10000
6	7	4	C	E	Olney	I	1925	0.0	1.0	1.0	3.0	2.0	1425.0	1164.0	21000
7	4	4	C	E	Hunting Park	I	1942	0.0	0.0	1.0	3.0	2.0	720.0	960.0	35000
9	4	4	C	E	Harrowgate	I	1920	0.0	0.0	1.0	3.0	1.0	602.0	840.0	39600
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
24472	4	4	C	E	Holmesburg	I	1950	0.0	1.0	1.0	3.0	2.0	1147.0	1034.0	163000
24473	4	4	C+	E	Mayfair	H	1940	0.0	0.0	1.0	3.0	2.0	1600.0	1360.0	139900
24474	4	4	C	E	Oxford Circle	I	1950	0.0	1.0	1.0	3.0	1.0	2813.0	1656.0	280000
24477	4	3	C	E	Mayfair	A	1950	0.0	1.0	2.0	3.0	1.0	1424.0	1292.0	280000
24479	4	4	C	E	Wissinoming	I	1920	0.0	0.0	1.0	3.0	2.0	1109.0	1076.0	150000

17430 rows × 15 columns

joined_trimmed['sale_price'] = pd.Series(joined_trimmed['sale_price'], dtype='int32')

joined_trimmed["log_price"] = np.log(joined_trimmed["sale_price"])

joined_trimmed

	exterior_condition	interior_condition	quality_grade	parcel_shape	listname	fireplaces	garage_spaces	number_of_bathrooms	number_of_bedrooms	number_stories	total_area	total_livable_area	sale_price	log_price
0	4	4	C	E	Hunting Park	0.0	0.0	1.0	3.0	2.0	720.0	960.0	35000	10.463103
1	7	4	C	E	Olney	0.0	1.0	1.0	3.0	2.0	1425.0	1164.0	21000	9.952278
6	4	3	C	E	Glenwood	0.0	0.0	1.0	4.0	2.0	1583.0	1960.0	10000	9.210340
7	4	4	C+	B	Chestnut Hill	0.0	0.0	1.0	4.0	2.0	2457.0	2110.0	350000	12.765688
8	4	4	C	E	West Oak Lane	0.0	0.0	1.0	3.0	2.0	1161.0	1120.0	70000	11.156251
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
24432	2	2	C	E	Whitman	0.0	0.0	1.0	3.0	2.0	736.0	951.0	150000	11.918391
24434	4	4	C+	E	Mayfair	0.0	0.0	1.0	3.0	2.0	1600.0	1360.0	139900	11.848683
24436	4	4	C	E	Stadium District	0.0	0.0	1.0	3.0	2.0	1053.0	960.0	190000	12.154779
24440	4	4	C	E	Richmond	0.0	0.0	1.0	3.0	2.0	741.0	1020.0	46000	10.736397
24442	4	4	C	E	Fox Chase	0.0	1.0	0.0	0.0	2.0	9000.0	1388.0	150000	11.918391

17461 rows × 14 columns

# 70/30 training / test split 
train_set, test_set = train_test_split(joined_trimmed, test_size=0.3, random_state=42)

# the target labels
y_train = train_set["log_price"]
y_test = test_set["log_price"]

# features
feature_cols = cat_cols + num_cols
X_train = train_set[feature_cols]
X_test = test_set[feature_cols]

# column transformers + pipeline for random forest
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ohe = OneHotEncoder()

transformer = ColumnTransformer(
    transformers=[
        ("num", StandardScaler(), num_cols),
        ("cat", OneHotEncoder(handle_unknown="ignore"), cat_cols),
    ]
)

pipe = make_pipeline(transformer, RandomForestRegressor(n_estimators=100,
                                       random_state=42)
)

# Fit the training set
pipe.fit(X_train, y_train);

# Test Score
pipe.score(X_test, y_test)

0.5646447325116657

# Setup for GridSearchCV

model_step = "randomforestregressor"
param_grid = {
    f"{model_step}__n_estimators": [5, 10, 15, 20, 30, 50, 100, 200],
    f"{model_step}__max_depth": [9, 13, 21, 33, 51],
}


# Create the grid and use 3-fold CV
grid = GridSearchCV(pipe, param_grid, cv=3, verbose=1)

# Run the search
grid.fit(X_train, y_train)

Fitting 3 folds for each of 40 candidates, totalling 120 fits

GridSearchCV(cv=3,
             estimator=Pipeline(steps=[('columntransformer',
                                        ColumnTransformer(transformers=[('num',
                                                                         StandardScaler(),
                                                                         ['fireplaces',
                                                                          'garage_spaces',
                                                                          'number_of_bathrooms',
                                                                          'number_of_bedrooms',
                                                                          'number_stories',
                                                                          'total_area',
                                                                          'total_livable_area']),
                                                                        ('cat',
                                                                         OneHotEncoder(handle_unknown='ignore'),
                                                                         ['exterior_condition',
                                                                          'listname'])])),
                                       ('randomforestregressor',
                                        RandomForestRegressor(random_state=42))]),
             param_grid={'randomforestregressor__max_depth': [9, 13, 21, 33,
                                                              51],
                         'randomforestregressor__n_estimators': [5, 10, 15, 20,
                                                                 30, 50, 100,
                                                                 200]},
             verbose=1)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

GridSearchCV

GridSearchCV(cv=3,
             estimator=Pipeline(steps=[('columntransformer',
                                        ColumnTransformer(transformers=[('num',
                                                                         StandardScaler(),
                                                                         ['fireplaces',
                                                                          'garage_spaces',
                                                                          'number_of_bathrooms',
                                                                          'number_of_bedrooms',
                                                                          'number_stories',
                                                                          'total_area',
                                                                          'total_livable_area']),
                                                                        ('cat',
                                                                         OneHotEncoder(handle_unknown='ignore'),
                                                                         ['exterior_condition',
                                                                          'listname'])])),
                                       ('randomforestregressor',
                                        RandomForestRegressor(random_state=42))]),
             param_grid={'randomforestregressor__max_depth': [9, 13, 21, 33,
                                                              51],
                         'randomforestregressor__n_estimators': [5, 10, 15, 20,
                                                                 30, 50, 100,
                                                                 200]},
             verbose=1)

estimator: Pipeline

Pipeline(steps=[('columntransformer',
                 ColumnTransformer(transformers=[('num', StandardScaler(),
                                                  ['fireplaces',
                                                   'garage_spaces',
                                                   'number_of_bathrooms',
                                                   'number_of_bedrooms',
                                                   'number_stories',
                                                   'total_area',
                                                   'total_livable_area']),
                                                 ('cat',
                                                  OneHotEncoder(handle_unknown='ignore'),
                                                  ['exterior_condition',
                                                   'listname'])])),
                ('randomforestregressor',
                 RandomForestRegressor(random_state=42))])

columntransformer: ColumnTransformer

ColumnTransformer(transformers=[('num', StandardScaler(),
                                 ['fireplaces', 'garage_spaces',
                                  'number_of_bathrooms', 'number_of_bedrooms',
                                  'number_stories', 'total_area',
                                  'total_livable_area']),
                                ('cat', OneHotEncoder(handle_unknown='ignore'),
                                 ['exterior_condition', 'listname'])])

num

['fireplaces', 'garage_spaces', 'number_of_bathrooms', 'number_of_bedrooms', 'number_stories', 'total_area', 'total_livable_area']

StandardScaler

StandardScaler()

cat

['exterior_condition', 'listname']

OneHotEncoder

OneHotEncoder(handle_unknown='ignore')

RandomForestRegressor

RandomForestRegressor(random_state=42)

# The best estimator
grid.best_estimator_

Pipeline(steps=[('columntransformer',
                 ColumnTransformer(transformers=[('num', StandardScaler(),
                                                  ['fireplaces',
                                                   'garage_spaces',
                                                   'number_of_bathrooms',
                                                   'number_of_bedrooms',
                                                   'number_stories',
                                                   'total_area',
                                                   'total_livable_area']),
                                                 ('cat',
                                                  OneHotEncoder(handle_unknown='ignore'),
                                                  ['exterior_condition',
                                                   'listname'])])),
                ('randomforestregressor',
                 RandomForestRegressor(max_depth=33, n_estimators=200,
                                       random_state=42))])

new_pipe = make_pipeline(transformer, RandomForestRegressor(max_depth=33, n_estimators=200,
                                       random_state=42))

# Fit the training set
new_pipe.fit(train_set, y_train);

# Test Score
new_pipe.score(test_set, y_test)

0.5675150349542478

2.5 Calculate the percent error of your model predictions for each sale in the test set

Fit your best model and use it to make predictions on the test set.

Note: This should be the percent error in terms of sale price. You’ll need to convert if your model predicted the log of sales price!

# this step is not necessary, just curious what the MAPE is. 
# X_test = test_set[feature_cols]
# y_test = test_set["log_price"].values
# y_test_no_log = np.exp(y_test)

def evaluate_mape(model, X_test, y_test):
    """
    Given a model and test features/targets, print out the 
    mean absolute error and accuracy
    """
    # Make the predictions
    predictions = model.predict(X_test)

    # Absolute Percentage error
    abs_errors = abs(np.exp(predictions) - y_test)
    avg_error = np.mean(abs_errors)

    # Mean absolute percentage error
    mape = 100 * np.mean(abs_errors / y_test)


    # Accuracy
    accuracy = 100 - mape

    print("Model Performance")
    print(f"Average Absolute Error: {avg_error:0.4f}")
    print(f"Accuracy = {accuracy:0.2f}%.")

    return accuracy

y_test_no_log = np.exp(y_test)

base_accuracy = evaluate_mape(pipe, X_test, y_test_no_log)

Model Performance
Average Absolute Error: 69104.9786
Accuracy = 51.06%.

# This is the actual percent error calculation part 

def percent_error(model, X_test, y_test):
    
    predictions = np.exp(model.predict(X_test))
    
    #percentage error
    p_errors = (predictions - y_test) / y_test
    
    return p_errors

p_errors = percent_error(pipe, X_test, y_test_no_log)

pipe.predict(X_test)

array([10.79687142, 12.01202048, 12.44467831, ..., 11.82508308,
       11.50655521, 12.76111153])

y_test

8682     11.245046
45       11.336188
6987     11.580584
16121    12.449019
8673     11.695247
           ...    
2252     12.254863
13927    13.422468
17434    11.695247
12885    11.289782
3753     13.060488
Name: log_price, Length: 5230, dtype: float64

p_errors

7209     0.095504
52      -0.068308
10900    3.547680
22227   -0.448610
18079    8.437723
           ...   
12894   -0.101753
15565   -0.107182
21955   -0.495361
3723    -0.141809
18913    0.245689
Name: log_price, Length: 5229, dtype: float64

2.6 Make a data frame with percent errors and census tract info for each sale in the test set

Create a data frame that has the property geometries, census tract data, and percent errors for all of the sales in the test set.

Notes

When using the “train_test_split()” function, the index of the test data frame includes the labels from the original sales data frame
You can use this index to slice out the test data from the original sales data frame, which should include the census tract info and geometries
Add a new column to this data frame holding the percent error data
Make sure to use the percent error and not the absolute percent error

test_df = joined.loc[test_set.index]

test_df['percent_error'] = p_errors

2.8 Plot a map of the median percent error by census tract

You’ll want to group your data frame of test sales by the GEOID10 column and take the median of you percent error column
Merge the census tract geometries back in and use geopandas to plot.

median_p_errors = test_df.groupby('GEOID10')['percent_error'].median()

median_p_errors = pd.DataFrame(median_p_errors).reset_index()

geo_median = tracts.merge(median_p_errors, on='GEOID10')

geo_median.explore(
    column="percent_error",
    tooltip="percent_error",
    popup=True,
    cmap="coolwarm",
    tiles="CartoDB positron",
    legend=True
    )

Make this Notebook Trusted to load map: File -> Trust Notebook

2.9 Compare the percent errors in Qualifying Census Tracts and other tracts

Qualifying Census Tracts are a poverty designation that HUD uses to allocate housing tax credits

I’ve included a list of the census tract names that qualify in Philadelphia
Add a new column to your dataframe of test set sales that is True/False depending on if the tract is a QCT
Then, group by this new column and calculate the median percent error

You should find that the algorithm’s accuracy is significantly worse in these low-income, qualifying census tracts

qct = ['5',
 '20',
 '22',
 '28.01',
 '30.01',
 '30.02',
 '31',
 '32',
 '33',
 '36',
 '37.01',
 '37.02',
 '39.01',
 '41.01',
 '41.02',
 '56',
 '60',
 '61',
 '62',
 '63',
 '64',
 '65',
 '66',
 '67',
 '69',
 '70',
 '71.01',
 '71.02',
 '72',
 '73',
 '74',
 '77',
 '78',
 '80',
 '81.01',
 '81.02',
 '82',
 '83.01',
 '83.02',
 '84',
 '85',
 '86.01',
 '86.02',
 '87.01',
 '87.02',
 '88.01',
 '88.02',
 '90',
 '91',
 '92',
 '93',
 '94',
 '95',
 '96',
 '98.01',
 '100',
 '101',
 '102',
 '103',
 '104',
 '105',
 '106',
 '107',
 '108',
 '109',
 '110',
 '111',
 '112',
 '113',
 '119',
 '121',
 '122.01',
 '122.03',
 '131',
 '132',
 '137',
 '138',
 '139',
 '140',
 '141',
 '144',
 '145',
 '146',
 '147',
 '148',
 '149',
 '151.01',
 '151.02',
 '152',
 '153',
 '156',
 '157',
 '161',
 '162',
 '163',
 '164',
 '165',
 '167.01',
 '167.02',
 '168',
 '169.01',
 '169.02',
 '170',
 '171',
 '172.01',
 '172.02',
 '173',
 '174',
 '175',
 '176.01',
 '176.02',
 '177.01',
 '177.02',
 '178',
 '179',
 '180.02',
 '188',
 '190',
 '191',
 '192',
 '195.01',
 '195.02',
 '197',
 '198',
 '199',
 '200',
 '201.01',
 '201.02',
 '202',
 '203',
 '204',
 '205',
 '206',
 '208',
 '239',
 '240',
 '241',
 '242',
 '243',
 '244',
 '245',
 '246',
 '247',
 '249',
 '252',
 '253',
 '265',
 '267',
 '268',
 '271',
 '274.01',
 '274.02',
 '275',
 '276',
 '277',
 '278',
 '279.01',
 '279.02',
 '280',
 '281',
 '282',
 '283',
 '284',
 '285',
 '286',
 '287',
 '288',
 '289.01',
 '289.02',
 '290',
 '291',
 '293',
 '294',
 '298',
 '299',
 '300',
 '301',
 '302',
 '305.01',
 '305.02',
 '309',
 '311.01',
 '312',
 '313',
 '314.01',
 '314.02',
 '316',
 '318',
 '319',
 '321',
 '325',
 '329',
 '330',
 '337.01',
 '345.01',
 '357.01',
 '376',
 '377',
 '380',
 '381',
 '382',
 '383',
 '389',
 '390']

geo_median['qct'] = geo_median['NAME10'].isin(qct)

geo_median.groupby('qct')['percent_error'].median()

qct
False   -0.059781
True     0.039906
Name: percent_error, dtype: float64