1 - Philly Donut Effect

In this assignment, we will practice our pandas skills and explore the “Donut Effect” within Philadelphia. The “Donut Effect” describes the following phenomenon: with more flexible working options and pandemic-driven density fears, people left urban dense cores and opted for more space in city suburbs, driving home and rental prices up in the suburbs relative to city centers.

We will be working with Zillow data for the Zillow Home Value Index (ZHVI) for Philadelphia ZIP codes. The goal will be to calculate home price appreciation in Philadelphia, comparing those ZIP codes in Center City (the central business district) to those not in Center City.

1. Load the data

I’ve already downloaded the relevant data file and put in the data/ folder. Let’s load it using pandas.

Note: Be sure to use a relative file path to make it easier to load your data when grading. See this guide for more info.

Code

import pandas as pd

Code

import numpy as np

Code

zip_df = pd.read_csv("data/Zip_zhvi_uc_sfrcondo_tier_0.33_0.67_sm_sa_month.csv")

2. Trim the data to just Philadelphia

Select the subset of the dataframe for Philadelphia, PA.

Code

# or use isin to trim data

state_trim = zip_df["StateName"].isin(["PA"])
state_trim_df= zip_df.loc[state_trim]

philly_df = state_trim_df.loc[state_trim_df["City"] == "Philadelphia"]
philly_df.head()

	RegionID	SizeRank	RegionName	RegionType	StateName	State	City	Metro	CountyName	2000-01-31	...	2021-10-31	2021-11-30	2021-12-31	2022-01-31	2022-02-28	2022-03-31	2022-04-30	2022-05-31	2022-06-30	2022-07-31
125	65810	126	19143	Zip	PA	PA	Philadelphia	Philadelphia-Camden-Wilmington	Philadelphia County	60701.0	...	173114.0	172087.0	171445.0	171542.0	171680.0	171878.0	171607.0	171333.0	171771.0	172611.0
247	65779	249	19111	Zip	PA	PA	Philadelphia	Philadelphia-Camden-Wilmington	Philadelphia County	85062.0	...	257911.0	260104.0	262257.0	263715.0	264809.0	265684.0	267222.0	269460.0	272201.0	274446.0
338	65791	340	19124	Zip	PA	PA	Philadelphia	Philadelphia-Camden-Wilmington	Philadelphia County	47155.0	...	156225.0	157780.0	159029.0	159274.0	159886.0	160780.0	161929.0	163625.0	165020.0	166009.0
423	65787	426	19120	Zip	PA	PA	Philadelphia	Philadelphia-Camden-Wilmington	Philadelphia County	59285.0	...	161167.0	161807.0	162634.0	162972.0	163597.0	164008.0	164887.0	165860.0	167321.0	168524.0
509	65772	512	19104	Zip	PA	PA	Philadelphia	Philadelphia-Camden-Wilmington	Philadelphia County	74255.0	...	220270.0	221454.0	222006.0	220760.0	217933.0	216447.0	216424.0	218663.0	220453.0	223443.0

5 rows × 280 columns

3. Melt the data into tidy format

Let’s transform the data from wide to tidy using the pd.melt() function. Create a new column in your data called “ZHVI” that holds the ZHVI values.

Code

def looks_like_a_date(col):
    """A function that tests if a string starts with '20'"""
    
    return col.startswith("20")

Code

philly_tidy = philly_df.melt(
    id_vars = ["RegionName","RegionID"],
    value_vars = list(filter(looks_like_a_date, philly_df.columns)),
    var_name = "Date",
    value_name = "ZHVI",
)

Code

philly_tidy

	RegionName	RegionID	Date	ZHVI
0	19143	65810	2000-01-31	60701.0
1	19111	65779	2000-01-31	85062.0
2	19124	65791	2000-01-31	47155.0
3	19120	65787	2000-01-31	59285.0
4	19104	65772	2000-01-31	74255.0
...	...	...	...	...
12461	19153	65820	2022-07-31	247560.0
12462	19118	65785	2022-07-31	746009.0
12463	19102	65770	2022-07-31	347614.0
12464	19127	65794	2022-07-31	318732.0
12465	19137	65804	2022-07-31	222107.0

12466 rows × 4 columns

4. Split the data for ZIP codes in/outside Center City

To compare home appreciation in Center City vs. outside Center City, we’ll need to split the data into two dataframes, one that holds the Center City ZIP codes and one that holds the data for the rest of the ZIP codes in Philadelphia.

To help with this process, I’ve included a list of ZIP codes that make up the “greater Center City” region of Philadelphia. Use this list to split the melted data into two dataframes.

Code

greater_center_city_zip_codes = [
    19123,
    19102,
    19103,
    19106,
    19107,
    19109,
    19130,
    19146,
    19147,
]

Code

# CENTER CITY
center_city_zip = philly_tidy["RegionName"].isin(greater_center_city_zip_codes)
center_city_zip
center_city = philly_tidy.loc[center_city_zip].copy()
center_city

	RegionName	RegionID	Date	ZHVI
9	19146	65813	2000-01-31	97460.0
12	19147	65814	2000-01-31	119919.0
14	19103	65771	2000-01-31	183436.0
18	19130	65797	2000-01-31	128477.0
33	19107	65775	2000-01-31	128049.0
...	...	...	...	...
12438	19130	65797	2022-07-31	431501.0
12453	19107	65775	2022-07-31	330958.0
12457	19123	65790	2022-07-31	443152.0
12458	19106	65774	2022-07-31	407423.0
12463	19102	65770	2022-07-31	347614.0

2168 rows × 4 columns

Code

# Out_of_Center_City
out_of_center_city_zip = ~philly_tidy["RegionName"].isin(greater_center_city_zip_codes)
out_of_center_city_zip
out_of_center_city = philly_tidy.loc[out_of_center_city_zip].copy()
out_of_center_city

	RegionName	RegionID	Date	ZHVI
0	19143	65810	2000-01-31	60701.0
1	19111	65779	2000-01-31	85062.0
2	19124	65791	2000-01-31	47155.0
3	19120	65787	2000-01-31	59285.0
4	19104	65772	2000-01-31	74255.0
...	...	...	...	...
12460	19129	65796	2022-07-31	302177.0
12461	19153	65820	2022-07-31	247560.0
12462	19118	65785	2022-07-31	746009.0
12464	19127	65794	2022-07-31	318732.0
12465	19137	65804	2022-07-31	222107.0

10298 rows × 4 columns

5. Compare home value appreciation in Philadelpia

In this step, we’ll calculate the average percent increase in ZHVI from March 2020 to March 2022 for ZIP codes in/out of Center City. We’ll do this by:

Writing a function (see the template below) that will calculate the percent increase in ZHVI from March 31, 2020 to March 31, 2022
Group your data and apply this function to calculate the ZHVI percent change for each ZIP code in Philadelphia. Do this for both of your dataframes from the previous step.
Calculate the average value across ZIP codes for both sets of ZIP codes and then compare

You should see much larger growth for ZIP codes outside of Center City…the Donut Effect!

Code

def calculate_percent_increase(group_df):
    
    march_20sel = group_df["Date"] == "2020-03-31"
    march_22sel = group_df["Date"] == "2022-03-31"
    
    march_20 = group_df.loc[march_20sel].squeeze()
    march_22 = group_df.loc[march_22sel].squeeze()
    
    columns = ["ZHVI"]
    
    return (march_22[columns] / march_20[columns] - 1) * 100

Code

# Center City Grouped
grouped_center_city = center_city.groupby("RegionName")
result_center_city = grouped_center_city.apply(calculate_percent_increase)
result_center_city

	ZHVI
RegionName
19102	-1.716711
19103	-1.696176
19106	2.520802
19107	2.883181
19123	5.212747
19130	6.673031
19146	6.480788
19147	6.139806

Code

# Outside Center City Grouped
grouped_outside = out_of_center_city.groupby("RegionName")
result_outside = grouped_outside.apply(calculate_percent_increase)
result_outside

	ZHVI
RegionName
19104	14.539797
19111	28.690446
19114	21.074312
19115	22.455454
19116	23.079842
19118	17.585001
19119	17.478667
19120	26.927423
19121	26.228643
19122	10.275804
19124	28.743474
19125	11.007135
19126	20.819254
19127	20.023926
19128	21.887555
19129	15.598565
19131	23.363129
19132	72.218386
19133	36.143992
19134	23.936841
19135	28.115259
19136	26.487833
19137	23.248505
19138	24.662626
19139	37.008969
19140	57.150847
19141	26.441684
19142	44.564396
19143	23.951077
19144	21.094020
19145	7.634693
19148	6.963237
19149	24.916458
19150	18.735248
19151	19.651429
19152	21.993528
19153	38.240461
19154	17.930932

Code

# Center City Growth
result_cc_mean = result_center_city.mean().squeeze()

Code

# Outside of Center City Growth
result_ot_mean = result_outside.mean().squeeze()

Code

HV_growth = pd.Series([result_cc_mean,result_ot_mean])
Location = pd.Series(["Center City", "Outside Center City"])
final_result_df = pd.DataFrame({"Location": Location, "Home Value growth (%)": HV_growth})
final_result_df

	Location	Home Value growth (%)
0	Center City	3.312183
1	Outside Center City	25.022864

Donut Effect it is!