Code
import pandas as pdIn this assignment, we will practice our pandas skills and explore the “Donut Effect” within Philadelphia. The “Donut Effect” describes the following phenomenon: with more flexible working options and pandemic-driven density fears, people left urban dense cores and opted for more space in city suburbs, driving home and rental prices up in the suburbs relative to city centers.
We will be working with Zillow data for the Zillow Home Value Index (ZHVI) for Philadelphia ZIP codes. The goal will be to calculate home price appreciation in Philadelphia, comparing those ZIP codes in Center City (the central business district) to those not in Center City.
I’ve already downloaded the relevant data file and put in the data/ folder. Let’s load it using pandas.
Note: Be sure to use a relative file path to make it easier to load your data when grading. See this guide for more info.
import pandas as pdimport numpy as npzip_df = pd.read_csv("data/Zip_zhvi_uc_sfrcondo_tier_0.33_0.67_sm_sa_month.csv")Select the subset of the dataframe for Philadelphia, PA.
# or use isin to trim data
state_trim = zip_df["StateName"].isin(["PA"])
state_trim_df= zip_df.loc[state_trim]
philly_df = state_trim_df.loc[state_trim_df["City"] == "Philadelphia"]
philly_df.head()| RegionID | SizeRank | RegionName | RegionType | StateName | State | City | Metro | CountyName | 2000-01-31 | ... | 2021-10-31 | 2021-11-30 | 2021-12-31 | 2022-01-31 | 2022-02-28 | 2022-03-31 | 2022-04-30 | 2022-05-31 | 2022-06-30 | 2022-07-31 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 125 | 65810 | 126 | 19143 | Zip | PA | PA | Philadelphia | Philadelphia-Camden-Wilmington | Philadelphia County | 60701.0 | ... | 173114.0 | 172087.0 | 171445.0 | 171542.0 | 171680.0 | 171878.0 | 171607.0 | 171333.0 | 171771.0 | 172611.0 |
| 247 | 65779 | 249 | 19111 | Zip | PA | PA | Philadelphia | Philadelphia-Camden-Wilmington | Philadelphia County | 85062.0 | ... | 257911.0 | 260104.0 | 262257.0 | 263715.0 | 264809.0 | 265684.0 | 267222.0 | 269460.0 | 272201.0 | 274446.0 |
| 338 | 65791 | 340 | 19124 | Zip | PA | PA | Philadelphia | Philadelphia-Camden-Wilmington | Philadelphia County | 47155.0 | ... | 156225.0 | 157780.0 | 159029.0 | 159274.0 | 159886.0 | 160780.0 | 161929.0 | 163625.0 | 165020.0 | 166009.0 |
| 423 | 65787 | 426 | 19120 | Zip | PA | PA | Philadelphia | Philadelphia-Camden-Wilmington | Philadelphia County | 59285.0 | ... | 161167.0 | 161807.0 | 162634.0 | 162972.0 | 163597.0 | 164008.0 | 164887.0 | 165860.0 | 167321.0 | 168524.0 |
| 509 | 65772 | 512 | 19104 | Zip | PA | PA | Philadelphia | Philadelphia-Camden-Wilmington | Philadelphia County | 74255.0 | ... | 220270.0 | 221454.0 | 222006.0 | 220760.0 | 217933.0 | 216447.0 | 216424.0 | 218663.0 | 220453.0 | 223443.0 |
5 rows × 280 columns
Let’s transform the data from wide to tidy using the pd.melt() function. Create a new column in your data called “ZHVI” that holds the ZHVI values.
def looks_like_a_date(col):
"""A function that tests if a string starts with '20'"""
return col.startswith("20")philly_tidy = philly_df.melt(
id_vars = ["RegionName","RegionID"],
value_vars = list(filter(looks_like_a_date, philly_df.columns)),
var_name = "Date",
value_name = "ZHVI",
)
philly_tidy| RegionName | RegionID | Date | ZHVI | |
|---|---|---|---|---|
| 0 | 19143 | 65810 | 2000-01-31 | 60701.0 |
| 1 | 19111 | 65779 | 2000-01-31 | 85062.0 |
| 2 | 19124 | 65791 | 2000-01-31 | 47155.0 |
| 3 | 19120 | 65787 | 2000-01-31 | 59285.0 |
| 4 | 19104 | 65772 | 2000-01-31 | 74255.0 |
| ... | ... | ... | ... | ... |
| 12461 | 19153 | 65820 | 2022-07-31 | 247560.0 |
| 12462 | 19118 | 65785 | 2022-07-31 | 746009.0 |
| 12463 | 19102 | 65770 | 2022-07-31 | 347614.0 |
| 12464 | 19127 | 65794 | 2022-07-31 | 318732.0 |
| 12465 | 19137 | 65804 | 2022-07-31 | 222107.0 |
12466 rows × 4 columns
To compare home appreciation in Center City vs. outside Center City, we’ll need to split the data into two dataframes, one that holds the Center City ZIP codes and one that holds the data for the rest of the ZIP codes in Philadelphia.
To help with this process, I’ve included a list of ZIP codes that make up the “greater Center City” region of Philadelphia. Use this list to split the melted data into two dataframes.
greater_center_city_zip_codes = [
19123,
19102,
19103,
19106,
19107,
19109,
19130,
19146,
19147,
]# CENTER CITY
center_city_zip = philly_tidy["RegionName"].isin(greater_center_city_zip_codes)
center_city_zip
center_city = philly_tidy.loc[center_city_zip].copy()
center_city| RegionName | RegionID | Date | ZHVI | |
|---|---|---|---|---|
| 9 | 19146 | 65813 | 2000-01-31 | 97460.0 |
| 12 | 19147 | 65814 | 2000-01-31 | 119919.0 |
| 14 | 19103 | 65771 | 2000-01-31 | 183436.0 |
| 18 | 19130 | 65797 | 2000-01-31 | 128477.0 |
| 33 | 19107 | 65775 | 2000-01-31 | 128049.0 |
| ... | ... | ... | ... | ... |
| 12438 | 19130 | 65797 | 2022-07-31 | 431501.0 |
| 12453 | 19107 | 65775 | 2022-07-31 | 330958.0 |
| 12457 | 19123 | 65790 | 2022-07-31 | 443152.0 |
| 12458 | 19106 | 65774 | 2022-07-31 | 407423.0 |
| 12463 | 19102 | 65770 | 2022-07-31 | 347614.0 |
2168 rows × 4 columns
# Out_of_Center_City
out_of_center_city_zip = ~philly_tidy["RegionName"].isin(greater_center_city_zip_codes)
out_of_center_city_zip
out_of_center_city = philly_tidy.loc[out_of_center_city_zip].copy()
out_of_center_city| RegionName | RegionID | Date | ZHVI | |
|---|---|---|---|---|
| 0 | 19143 | 65810 | 2000-01-31 | 60701.0 |
| 1 | 19111 | 65779 | 2000-01-31 | 85062.0 |
| 2 | 19124 | 65791 | 2000-01-31 | 47155.0 |
| 3 | 19120 | 65787 | 2000-01-31 | 59285.0 |
| 4 | 19104 | 65772 | 2000-01-31 | 74255.0 |
| ... | ... | ... | ... | ... |
| 12460 | 19129 | 65796 | 2022-07-31 | 302177.0 |
| 12461 | 19153 | 65820 | 2022-07-31 | 247560.0 |
| 12462 | 19118 | 65785 | 2022-07-31 | 746009.0 |
| 12464 | 19127 | 65794 | 2022-07-31 | 318732.0 |
| 12465 | 19137 | 65804 | 2022-07-31 | 222107.0 |
10298 rows × 4 columns
In this step, we’ll calculate the average percent increase in ZHVI from March 2020 to March 2022 for ZIP codes in/out of Center City. We’ll do this by:
You should see much larger growth for ZIP codes outside of Center City…the Donut Effect!
def calculate_percent_increase(group_df):
march_20sel = group_df["Date"] == "2020-03-31"
march_22sel = group_df["Date"] == "2022-03-31"
march_20 = group_df.loc[march_20sel].squeeze()
march_22 = group_df.loc[march_22sel].squeeze()
columns = ["ZHVI"]
return (march_22[columns] / march_20[columns] - 1) * 100# Center City Grouped
grouped_center_city = center_city.groupby("RegionName")
result_center_city = grouped_center_city.apply(calculate_percent_increase)
result_center_city| ZHVI | |
|---|---|
| RegionName | |
| 19102 | -1.716711 |
| 19103 | -1.696176 |
| 19106 | 2.520802 |
| 19107 | 2.883181 |
| 19123 | 5.212747 |
| 19130 | 6.673031 |
| 19146 | 6.480788 |
| 19147 | 6.139806 |
# Outside Center City Grouped
grouped_outside = out_of_center_city.groupby("RegionName")
result_outside = grouped_outside.apply(calculate_percent_increase)
result_outside| ZHVI | |
|---|---|
| RegionName | |
| 19104 | 14.539797 |
| 19111 | 28.690446 |
| 19114 | 21.074312 |
| 19115 | 22.455454 |
| 19116 | 23.079842 |
| 19118 | 17.585001 |
| 19119 | 17.478667 |
| 19120 | 26.927423 |
| 19121 | 26.228643 |
| 19122 | 10.275804 |
| 19124 | 28.743474 |
| 19125 | 11.007135 |
| 19126 | 20.819254 |
| 19127 | 20.023926 |
| 19128 | 21.887555 |
| 19129 | 15.598565 |
| 19131 | 23.363129 |
| 19132 | 72.218386 |
| 19133 | 36.143992 |
| 19134 | 23.936841 |
| 19135 | 28.115259 |
| 19136 | 26.487833 |
| 19137 | 23.248505 |
| 19138 | 24.662626 |
| 19139 | 37.008969 |
| 19140 | 57.150847 |
| 19141 | 26.441684 |
| 19142 | 44.564396 |
| 19143 | 23.951077 |
| 19144 | 21.094020 |
| 19145 | 7.634693 |
| 19148 | 6.963237 |
| 19149 | 24.916458 |
| 19150 | 18.735248 |
| 19151 | 19.651429 |
| 19152 | 21.993528 |
| 19153 | 38.240461 |
| 19154 | 17.930932 |
# Center City Growth
result_cc_mean = result_center_city.mean().squeeze()# Outside of Center City Growth
result_ot_mean = result_outside.mean().squeeze()HV_growth = pd.Series([result_cc_mean,result_ot_mean])
Location = pd.Series(["Center City", "Outside Center City"])
final_result_df = pd.DataFrame({"Location": Location, "Home Value growth (%)": HV_growth})
final_result_df| Location | Home Value growth (%) | |
|---|---|---|
| 0 | Center City | 3.312183 |
| 1 | Outside Center City | 25.022864 |
Donut Effect it is!