Part III - Museum Collections - MET, MoMA, British Museum

The last part of this project zooms into museum collections at famous worldwide museums, including the MET, MoMA, British Museum, V&AM, and Philadelphia Museum of Art (used as an example for web scrapping, though it Is not located in New York or London). The part is broken into four sections – starting with an overview of large museums worldwide, then moving into three different ways of acquiring information about collections at museums online.

The first group is MET and MoMA, which have the most well-constructed digital galleries and open-access data hosted on GitHub. With the existing dataset, it is very easy to take a deep dive into the analysis.

The second group is British Museum, which does not have GitHub page but has a mature database for their abundant objects in the collections. Their websites have the function of downloading the search results in csv, if the results are under 20,000 items. In comparison to web-scrapping, this query and open access system allows visitors to download a way more detailed table of all the search results, despite the capacity is capped at 20,000 (so if you want to download more than 20,000, got to find a way to parse and download two times separately).

The last group is museums that has online digital collections showcasing on website but not yet given open data access to the general public. With their digital gallery, it is possible for users to web scrap and acquire basic information of objects. But two downsides are 1) it is very slow to scrap many items, and 2) the information acquired is too basic (i.e., usually only the name of the artwork, name of the artist, year, and geography is displayed on the search result page). As you will see in section 2 in this document, there is a clear comparison in terms of the level of details available about each item when comparing the three approaches.

Ultimately, this part of the project is to advocate for more resources given to impactful large museums to establish their online open access database. The public will undoubtedly enjoy it and may be able to generate some important insights for Museum’s future curations.

Code
import pandas as pd
import geopandas as gpd
import plotly.express as px
from plotly.offline import init_notebook_mode, iplot
import plotly.graph_objects as go
import numpy as np
pd.set_option('display.max_columns', None)

0. World Museum at a Glance

To offer some context, I have picked some of the most well-known and largest museums in the world. In terms of the size of collections, the British Museum is unarguably the champion amongst this group, followed by the Palace Museum in Beijing. Size, reputation, management, and the larger political agenda (e.g., whether it is a national museum or museum representing important aspects of local culture) are important factors when it comes to digital accessibility and choices of whether to open the digitized items. The funding received by government and private donors are also determinants of to what extent are the museums capable of digitizing partially or all of their items. Many non-English speaking countries’ national museums also prioritize their local languages when it comes to description and research (as it should be), so there is a difference when it comes to access in English. For the sake of ease in conducting research, I have picked museums in the U.S. and U.K., so I don’t have to translate.

Code
world_museums = pd.read_csv("./Final_Data/Museum Collection Numbers.csv").dropna()
world_museums_location = gpd.read_file("./Final_Data/world_museum_location.geojson")
world_museums['Quantity'] = world_museums['Quantity'].astype(int)
world_museums['Museum'] = world_museums['Museum'].astype(object)
Code
location = world_museums_location.merge(world_museums, on="Museum")
location.explore(
     tiles="cartodbpositron",
)
Make this Notebook Trusted to load map: File -> Trust Notebook
Code
fig = px.bar(world_museums, x='Museum', y='Quantity', 
             title='Size of Collections at World Famous Museums (by Works)', 
             height=700, width=1000,
             template="plotly_white")
fig.show()

1. MET & MoMA’s Open Access Digital Collection

I was surprised when I found out the MET and MoMA in NYC have their GitHub account for hosting online database, which contains detailed information of more than 400,000 and 100,000 items respectively. They might be the only two museums in the world so far to have such transparency for their digitized collection.

https://github.com/MuseumofModernArt

https://github.com/metmuseum/openaccess

This section showcases what possible interesting exploration one can do with such open access data, and hence advocating for more museums to join the two museums in this wave of digitization and opening access. In the next part (Part II), you will see how the level of interestingness and complexity can differ a lot by different approaches of accessing data.

Code
# Load Data
met = pd.read_csv("./Final_Data/MetObjects.txt")
moma_artist = pd.read_csv("./Final_Data/Moma_Artists.txt")
moma_artwork = pd.read_csv("./Final_Data/Moma_artworks.txt")
/var/folders/q3/y0zpvj752qg3_3nvpkx6v2300000gn/T/ipykernel_94597/3272275578.py:2: DtypeWarning:

Columns (5,7,10,11,12,13,14,34,35,36,37,38,39,40,41,42,43,44,45,46) have mixed types. Specify dtype option on import or set low_memory=False.

1.1 Department Breakdown

The first quick glance goes to this departmental breakdown. It is apparent that both museums have the largest collection in Drawings and Paintings, followed by photography. But the museums are taking different approaches when it comes to managing and breaking down – the MET is managing collection and departments by genre (a comprehensive way of distinguishing temporal, geographical, and thematic features of artworks), considering its large variety of artwork profiles; MoMA, on the other hand, takes the approach of medium, like many other modern art museums do.

Code
met_dept = pd.DataFrame(met.groupby(['Department']).size()).reset_index()
met_dept = met_dept.rename(columns={met_dept.columns[1]: 'Counts'})
moma_dept = pd.DataFrame(moma_artwork.groupby(['Department']).size()).reset_index()
moma_dept = moma_dept.rename(columns={moma_dept.columns[1]: 'Counts'})
Code
fig = go.Figure()


fig.add_trace(
    go.Bar(x=met_dept['Department'], 
           y=met_dept['Counts'], 
           name='Met Dept',
          marker=dict(color="#E81D2E")))


fig.add_trace(
    go.Bar(x=moma_dept['Department'], 
           y=moma_dept['Counts'], 
           name='MoMA Dept',
          marker=dict(color="Black")))

#add dropdown

fig.update_layout(
    updatemenus=[
        dict(
            active=0,
            buttons=list([
                dict(label="Met",
                    method="update",
                    args=[{"visible": [True, False]},
                        {"title": "Metropolitan Museum of Art Department Breakdown",
                         "annotations": []}]),
                dict(label="MoMA",
                    method="update",
                    args=[{"visible": [False, True]},
                        {"title": "Modern Museum of Art Department Breakdown",
                         "annotations": []}])
            ]))])

fig.update_layout(title_text="Number of Objects Held by Departments at Museums", 
                  height=700,
                 template='plotly_white')
fig.show()

1.2 Top Artists

After departments, I am curious about who are the top 30 artists, who has the most artworks owned by the two museums. So, I have counted their artwork, ranked, and grouped their work by the departmental categories that was explored in section 1.1. The results are shown in the bar charts below. As expected, most top artists are predominantly producing photography work or paintings, which consist of the largest collections at both Museums.

Code
met_top_artist = pd.DataFrame(met.groupby(['Artist Display Name']).size()).reset_index()
met_top_artist = met_top_artist.rename(columns={met_top_artist.columns[1]: 'Counts'})
met_top_artist_with_Co = met_top_artist[(met_top_artist['Artist Display Name'] != 'Unknown') & 
                                ~met_top_artist['Artist Display Name'].str.contains('Anonymous', case=False) &
                                (met_top_artist['Artist Display Name'] != 'Unidentified artist')]

met_top_artist = met_top_artist[(met_top_artist['Artist Display Name'] != 'Unknown') & 
                                ~met_top_artist['Artist Display Name'].str.contains('Anonymous', case=False) &
                                ~met_top_artist['Artist Display Name'].str.contains('company', case=False) &
                                ~met_top_artist['Artist Display Name'].str.contains('Co.', case=False) &
                                (met_top_artist['Artist Display Name'] != 'Unidentified artist')]
 
moma_top_artist = pd.DataFrame(moma_artwork.groupby(['Artist']).size()).reset_index()
moma_top_artist = moma_top_artist.rename(columns={moma_top_artist.columns[1]: 'Counts'})
moma_top_artist = moma_top_artist[(moma_top_artist['Artist'] != 'Unknown') & 
                                  (moma_top_artist['Artist'] != 'Anonymous') &
                                  ~moma_top_artist['Artist'].str.contains('Unidentified', case=False)]
Code
met_top_30 = met_top_artist.loc[met_top_artist['Counts'].nlargest(30).index]
moma_top_30 = moma_top_artist.loc[moma_top_artist['Counts'].nlargest(30).index]
met_30_artworks = met[met['Artist Display Name'].isin(met_top_30['Artist Display Name'])] 
moma_30_artworks = moma_artwork[moma_artwork['Artist'].isin(moma_top_30['Artist'])] 
moma_30_work_breakdown = moma_30_artworks.groupby(['Artist', 'Department']).size().reset_index()
moma_30_work_breakdown = moma_30_work_breakdown.rename(columns={moma_30_work_breakdown.columns[2]: 'Counts'})

met_30_work_breakdown = met_30_artworks.groupby(['Artist Display Name', 'Department']).size().reset_index()
met_30_work_breakdown = met_30_work_breakdown.rename(columns={met_30_work_breakdown.columns[2]: 'Counts'})

moma_30_work_pivot = moma_30_work_breakdown.pivot(index='Artist', columns='Department', values="Counts").fillna(0)

met_30_work_pivot = met_30_work_breakdown.pivot(index='Artist Display Name', columns='Department', values="Counts").fillna(0)
Code
order = moma_top_30['Artist'].values
moma_30_work_pivot.index = pd.CategoricalIndex(moma_30_work_pivot.index, categories=order, ordered=True)
moma_30_work_pivot = moma_30_work_pivot.sort_index()

order = met_top_30['Artist Display Name'].values
met_30_work_pivot.index = pd.CategoricalIndex(met_30_work_pivot.index, categories=order, ordered=True)
met_30_work_pivot = met_30_work_pivot.sort_index()
Code
fig = go.Figure()

colors = ['#3A6C8C', '#0F3D3F','#B3D8EB','#728F4C','#CEE0C6','#242545','#EF819C','#F4B8D4']

headings = ['Architecture & Design', 'Architecture & Design - Image Archive', 'Drawings & Prints', 'Film', 'Fluxus Collection', 'Media and Performance', 'Painting & Sculpture', 'Photography']



import plotly.graph_objects as go


x_data = np.transpose(moma_30_work_pivot.values)
y_data = moma_30_work_pivot.index.values


for heading, xd, colors in zip(headings, x_data, colors):
    fig.add_trace(go.Bar(
            x=xd, 
            y=y_data,
            name=heading,
            orientation='h',
            marker=dict(
                color=colors,
                line=dict(color='rgb(248, 248, 249)', width=1)
            )
        ))

fig.update_layout(
    height=800,
    width=1500,
    yaxis=dict(autorange="reversed"),
    barmode='stack',
    margin=dict(l=120, r=10, t=140, b=80),
    showlegend=True,
    template='plotly_white',
    autosize=True,
    title='Top 30 Artists whom MoMA Holds Most Works of'
)



fig.show()
Code
fig = go.Figure()

colors = ['#3A6C8C','#dc596d','#B3D8EB','#949EC3', '#8B7099', '#242545','#EF819C','#F4B8D4','#728F4C','#CEE0C6', '#ffbb93', '#fa958f' ]

headings = met_30_work_pivot.columns.to_numpy()



import plotly.graph_objects as go


x_data = np.transpose(met_30_work_pivot.values)
y_data = met_30_work_pivot.index.values


for heading, xd, colors in zip(headings, x_data, colors):
    fig.add_trace(go.Bar(
            x=xd, 
            y=y_data,
            name=heading,
            orientation='h',
            marker=dict(
                color=colors,
                line=dict(color='rgb(248, 248, 249)', width=1)
            )
        ))

fig.update_layout(
    height=800,
    width=1500,
    yaxis=dict(autorange="reversed"),
    barmode='stack',
    margin=dict(l=120, r=10, t=140, b=80),
    showlegend=True,
    template='plotly_white',
    autosize=True,
    title='Top 30 Artists whom the Met Holds Most Works of'
)



fig.show()

1.3 Top Nationalities of Artists

Along the same line, I analyzed the top nationalities of artists whose work are hosted at the MET. The results show that the MET and MoMA both has the most artwork from American artists, though MoMA’s American artists’ collection is significantly larger than artworks of other nationalities. Additionally, we are seeing British, French, Japanese, Italian, German, and Dutch being leading group of artists. One disclaimer is that the analysis is conducted using different datasets – the MET one is generated from counting number of occurrence of one nationality from the list, while MoMA has a separate list of artists, which is used for this analysis. So, the MET analysis may double count artists if they have multiple work hosted, but MoMA’s count only for unique values. Despite the potential inaccuracy in the absolute value, this is useful to generate insights on leading nationalities of artists within either museum respectively.

Code
filtered = met.dropna(subset=['Artist Nationality'])
word_counts = filtered['Artist Nationality'].str.split('[\s,|]', expand=True).stack().value_counts()
word_counts = pd.DataFrame(word_counts)
Code
nationality = word_counts.iloc[0:12].reset_index()
nationality = nationality.drop(index=[1,6])
nationality = nationality.rename(columns={nationality.columns[0]: 'Nationalities',nationality.columns[1]: 'Counts' })
Code
fig = px.bar(x=nationality['Counts'].values, y=nationality['Nationalities'].values,
            orientation='h')

fig.update_layout(
    height=800,
    width=1500,
    yaxis=dict(autorange="reversed"),
    barmode='stack',
    margin=dict(l=120, r=10, t=140, b=80),
    showlegend=True,
    template='plotly_white',
    autosize=True,
    title='Top 10 Nationalities of Artists at the MET'
)



fig.show()
Code
moma_nationality = pd.DataFrame(moma_artist.groupby(['Nationality']).size().reset_index())
moma_nationality = moma_nationality.rename(columns={moma_nationality.columns[1]: 'Counts' })
moma_nationality = moma_nationality.loc[moma_nationality['Counts'].nlargest(10).index]
Code
fig = px.bar(x=moma_nationality['Counts'].values, y=moma_nationality['Nationality'].values,
            orientation='h')

fig.update_layout(
    height=800,
    width=1500,
    yaxis=dict(autorange="reversed"),
    barmode='stack',
    margin=dict(l=120, r=10, t=140, b=80),
    showlegend=True,
    template='plotly_white',
    autosize=True,
    title='Top 10 Nationalities of Artists at MoMA'
)



fig.show()

2. Digital Access to Collection Comparison - Chinese Art as an Example

This section compares three approaches of acquiring data from museum’s online galleries / digital collections, using Chinese artworks as an example (because it is usually smaller than other genres and have more variation in terms of uncertain information, such as unknown artists or time). It hopes to show different levels of analysis one can make with the data.

2.1 MET - Open Access CSV / JSON file

Utilizing the MET’s open data, I analyzed the top 30 tags / recurring themes of Chinese artwork. This is the kind of data analytics made possible by large scale digitization of in-house collection. It’s quite interesting to see how themes emerge as we put them into the database. The data can also be used for other types of analysis, shown in the first part already.

Code
met_china_art = met.loc[(met['Culture'] == "China")]
Code
met_china_art.head()
Object Number Is Highlight Is Timeline Work Is Public Domain Object ID Gallery Number Department AccessionYear Object Name Title Culture Period Dynasty Reign Portfolio Constituent ID Artist Role Artist Prefix Artist Display Name Artist Display Bio Artist Suffix Artist Alpha Sort Artist Nationality Artist Begin Date Artist End Date Artist Gender Artist ULAN URL Artist Wikidata URL Object Date Object Begin Date Object End Date Medium Dimensions Credit Line Geography Type City State County Country Region Subregion Locale Locus Excavation River Classification Rights and Reproduction Link Resource Object Wikidata URL Metadata Date Repository Tags Tags AAT URL Tags Wikidata URL
6933 13.31.15 False False True 7411 774 The American Wing 1913.0 Shaving mug Shaving Mug China NaN NaN NaN NaN 188 Maker E. & W. Bennett Pottery American, Baltimore, Maryland 1847–1857 Bennett, E. & W., Pottery American 1847 1857 NaN http://vocab.getty.edu/page/ulan/500524602 https://www.wikidata.org/wiki/Q98446707 ca. 1853 1850 1853 Mottled brown earthenware H. 4 3/8 in. (11.1 cm) Rogers Fund, 1913 Made in Baltimore NaN NaN United States NaN NaN NaN NaN NaN NaN NaN NaN http://www.metmuseum.org/art/collection/search... https://www.wikidata.org/wiki/Q116342297 NaN Metropolitan Museum of Art, New York, NY Men http://vocab.getty.edu/page/aat/300025928 https://www.wikidata.org/wiki/Q8441
6979 33.120.164 False False True 7457 774 The American Wing 1933.0 Buckle Shoe Buckle China NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ca. 1800 1797 1800 Silver 2 3/8 x 1 3/4 in. (6 x 4.4 cm) Bequest of Alphonso T. Clearwater, 1933 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN http://www.metmuseum.org/art/collection/search... https://www.wikidata.org/wiki/Q116341420 NaN Metropolitan Museum of Art, New York, NY NaN NaN NaN
30296 96.14.1896 False False True 35967 NaN Asian Art 1896.0 Panel NaN China NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 18th century or earlier 1650 1799 Paint; on leather 9 1/4 x 5 3/8 in. (23.5 x 13.7 cm) Gift of Mr. and Mrs. H. O. Havemeyer, 1896 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Leatherwork NaN http://www.metmuseum.org/art/collection/search... NaN NaN Metropolitan Museum of Art, New York, NY Musical Instruments|Men|Elephants|Flowers http://vocab.getty.edu/page/aat/300041620|http... https://www.wikidata.org/wiki/Q34379|https://w...
30297 09.3 False False True 35968 NaN Asian Art 1909.0 Pictorial map 清 佚名 台南地區荷蘭城堡|Forts Zeelandia and Provintia ... China NaN NaN NaN NaN 3750 Artist Unidentified artist Chinese, active 19th century Unidentified artist NaN NaN NaN 19th century 1800 1899 Wall hanging; ink and color on deerskin Image: 59 1/4 × 80 3/4 in. (150.5 × 205.1 cm)\... Gift of J. Pierpont Morgan, 1909 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Paintings NaN http://www.metmuseum.org/art/collection/search... https://www.wikidata.org/wiki/Q79003782 NaN Metropolitan Museum of Art, New York, NY Maps|Houses|Cities|Boats|Ships http://vocab.getty.edu/page/aat/300028094|http... https://www.wikidata.org/wiki/Q4006|https://ww...
30298 12.37.135 False False False 35969 NaN Asian Art 1912.0 Hanging scroll NaN China Qing dynasty (1644–1911) NaN NaN NaN 1214 Artist Jin Zunnian Chinese, active early 18th century Jin Zunnian Chinese 1700 1800 NaN NaN NaN dated 1732 1732 1732 Hanging scroll; ink and color on silk 67 x 38 in. (170.2 x 96.5 cm) Rogers Fund, 1912 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Paintings NaN http://www.metmuseum.org/art/collection/search... NaN NaN Metropolitan Museum of Art, New York, NY NaN NaN NaN
Code
met_china_art = met_china_art.dropna(subset=['Tags'])
Code
tags_counts = met_china_art['Tags'].str.split('[\s,|]', expand=True).stack().value_counts()
tags_counts = pd.DataFrame(tags_counts).reset_index()
tags_counts = tags_counts.rename(columns={tags_counts.columns[1]: 'Counts' })

tags_counts_met = tags_counts.loc[tags_counts['Counts'].nlargest(30).index]
Code
fig = px.bar(y=tags_counts_met['index'], x=tags_counts_met['Counts'])
fig.update_layout(
    height=800,
    width=1500,
    yaxis=dict(autorange="reversed"),
    barmode='stack',
    margin=dict(l=120, r=10, t=140, b=80),
    showlegend=True,
    template='plotly_white',
    autosize=True,
    title='Top 30 Tags / Themes of Chinese artwork at the MET'
)


fig.show()

2.2 British Museum - Access Search Result Download

A similar level of deep analysis into the content and themes of artwork can be conducted on the dataset downloaded from The British Museum. While the British Museum allows to download all results (cap at 20,000 items), some other museums like VAM (Victoria and Albert Museum only allows to download one page at a time (15 or 50 items), which is not efficient for large-scale analysis. However, they claim that they have an API to be utilized, that is not explore as a part of this project.

In addition to recurring themes, I also tried to do a quick glance of most used materials for Chinese artwork. Quick glance of such data might interest profane visitors who doesn’t have much background in Chinese history or art history in general.

Code
BM_Result = pd.read_csv("./Final_Data/3/British_Museum_Result.csv")
Code
BM_Materials = BM_Result.dropna(subset=['Materials'])
BM_Subjects = BM_Result.dropna(subset=['Subjects'])
Code
BM_Subjects_counts = BM_Subjects['Subjects'].str.split('[\s,|;]', expand=True).stack().value_counts()
BM_Subjects_counts = pd.DataFrame(BM_Subjects_counts).reset_index()
BM_Subjects_counts = BM_Subjects_counts.rename(columns={BM_Subjects_counts.columns[0]: 'Subjects', BM_Subjects_counts.columns[1]: 'Counts' })

BM_Subjects_counts = BM_Subjects_counts.loc[BM_Subjects_counts['Counts'].nlargest(34).index]
BM_Subjects_counts = BM_Subjects_counts.drop(index=[0,5,16,17])
Code

fig = px.bar(y=BM_Subjects_counts['Subjects'], x=BM_Subjects_counts['Counts'])
fig.update_layout(
    height=800,
    width=1500,
    yaxis=dict(autorange="reversed"),
    barmode='stack',
    margin=dict(l=120, r=10, t=140, b=80),
    showlegend=True,
    template='plotly_white',
    autosize=True,
    title='Top 30 Tags / Themes of Chinese artwork at the British Museum'
)


fig.show()
Code
BM_Materials_counts = BM_Materials['Materials'].str.split('[\s,|;]', expand=True).stack().value_counts()
BM_Materials_counts = pd.DataFrame(BM_Materials_counts).reset_index()
BM_Materials_counts = BM_Materials_counts.rename(columns={BM_Materials_counts.columns[0]: 'Materials', BM_Materials_counts.columns[1]: 'Counts' })

BM_Materials_counts = BM_Materials_counts.loc[BM_Materials_counts['Counts'].nlargest(31).index]
BM_Materials_counts = BM_Materials_counts.drop(index=[1])
Code

fig = px.bar(y=BM_Materials_counts['Materials'], x=BM_Materials_counts['Counts'])
fig.update_layout(
    height=800,
    width=1500,
    yaxis=dict(autorange="reversed"),
    barmode='stack',
    margin=dict(l=120, r=10, t=140, b=80),
    showlegend=True,
    template='plotly_white',
    autosize=True,
    title='Top 30 Materials of Chinese artwork on display at the British Museum'
)


fig.show()

2.3 Web Scrapping - Philadelphia Museum of Art as an example

The last approach is to scrap from the online galleries. The example here is Philadelphia Museum of Art’s Chinese art collection. In comparison to the earlier two, the information scrapped from the web is a lot less in detail. Particularly for art genres like Chinese art, of which many have unknown artist name or time or production, the information scrapped would be not useful, as we are not able to efficiently scrape details of many objects. Hence, the potential analyses are limited.

Code
from selenium import webdriver
from bs4 import BeautifulSoup
import requests
from time import sleep
Code
driver = webdriver.Chrome()
url = "https://philamuseum.org/search/collections?from=0&size=48&filters=%7B%22department%22%3A%5B%22East%20Asian%20Art%22%5D%2C%22place%22%3A%5B%22China%22%5D%7D"
response = driver.get(url)
html_content = driver.page_source
NoSuchWindowException: Message: no such window: target window already closed
from unknown error: web view not found
  (Session info: chrome=120.0.6099.71)
Stacktrace:
0   chromedriver                        0x000000010a906c48 chromedriver + 4852808
1   chromedriver                        0x000000010a8fe1b3 chromedriver + 4817331
2   chromedriver                        0x000000010a4ca7bd chromedriver + 411581
3   chromedriver                        0x000000010a49e2f8 chromedriver + 230136
4   chromedriver                        0x000000010a54c41f chromedriver + 943135
5   chromedriver                        0x000000010a563226 chromedriver + 1036838
6   chromedriver                        0x000000010a5449a3 chromedriver + 911779
7   chromedriver                        0x000000010a50c103 chromedriver + 680195
8   chromedriver                        0x000000010a50d71e chromedriver + 685854
9   chromedriver                        0x000000010a8c6792 chromedriver + 4589458
10  chromedriver                        0x000000010a8cb99c chromedriver + 4610460
11  chromedriver                        0x000000010a8abcb1 chromedriver + 4480177
12  chromedriver                        0x000000010a8cc716 chromedriver + 4613910
13  chromedriver                        0x000000010a89d23c chromedriver + 4420156
14  chromedriver                        0x000000010a8ec798 chromedriver + 4745112
15  chromedriver                        0x000000010a8ec94e chromedriver + 4745550
16  chromedriver                        0x000000010a8fddf3 chromedriver + 4816371
17  libsystem_pthread.dylib             0x00007ff802926259 _pthread_start + 125
18  libsystem_pthread.dylib             0x00007ff802921c7b thread_start + 15
Code
soup = BeautifulSoup(html_content, 'html.parser')
Code
selector = ".searchcard"

tables = soup.select(selector)
Code
results = []


max_pages = 10

# The base URL we will be using
base_url = "https://philamuseum.org/search/collections?"

# loop over each page of search results
for page_num in range(1, max_pages + 1):
    print(f"Processing page {page_num}...")

    obj_num = (page_num-1)*48
    
    # Update the URL hash for this page number and make the combined URL
    url_hash = f"from={obj_num}&size=48&filters=%7B%22department%22%3A%5B%22East%20Asian%20Art%22%5D%2C%22place%22%3A%5B%22China%22%5D%7D"
    url = base_url + url_hash

    # Go to the driver and wait for 5 seconds
    driver.get(url)
    sleep(5)

    # YOUR CODE: get the list of all apartments
    # This is the same code from Part 1.2 and 1.3
    # It should be a list of 120 apartments
    soup = soup
    objects = tables
    print("Number of Objects = ", len(objects))

    # loop over each apartment in the list
    page_results = []
    for artwork in objects:

        #artwork name
        artwork_name = artwork.select_one(".card-title").text

        #artist, Geoegraphy, Time 
        artist_geo_time = artwork.select_one(".card-body").text
              
        # Save the result
        page_results.append([artwork_name, artist_geo_time])

    # Create a dataframe and save
    col_names = ["artwork_name", "artist_geo_time"]
    df = pd.DataFrame(page_results, columns=col_names)
    results.append(df)

    print("sleeping for 10 seconds between calls")
    sleep(10)

# Finally, concatenate all the results
results = pd.concat(results, axis=0).reset_index(drop=True)
Processing page 1...
Number of Objects =  48
sleeping for 10 seconds between calls
Processing page 2...
Number of Objects =  48
sleeping for 10 seconds between calls
Processing page 3...
Number of Objects =  48
sleeping for 10 seconds between calls
Processing page 4...
Number of Objects =  48
sleeping for 10 seconds between calls
Processing page 5...
Number of Objects =  48
sleeping for 10 seconds between calls
Processing page 6...
Number of Objects =  48
sleeping for 10 seconds between calls
Processing page 7...
Number of Objects =  48
sleeping for 10 seconds between calls
Processing page 8...
Number of Objects =  48
sleeping for 10 seconds between calls
Processing page 9...
Number of Objects =  48
sleeping for 10 seconds between calls
Processing page 10...
Number of Objects =  48
sleeping for 10 seconds between calls
Code
results[['Artist', 'Geography', 'Time']] = pd.DataFrame(results['artist_geo_time'].str.split(',').tolist(), index=results.index)
  
Code
results.head(10)
artwork_name artist_geo_time Artist Geography Time
0 Reception Hall Artist/maker unknown, Chinese Artist/maker unknown Chinese None
1 Portrait of a Manchu Lady Mangguli, Chinese (Manchu), 1672 - 1736 Mangguli Chinese (Manchu) 1672 - 1736
2 Jar Artist/maker unknown, Chinese Artist/maker unknown Chinese None
3 Covered Cup Artist/maker unknown, Chinese Artist/maker unknown Chinese None
4 Cup Artist/maker unknown, Chinese Artist/maker unknown Chinese None
5 Teapot Artist/maker unknown, Chinese Artist/maker unknown Chinese None
6 Bowl Artist/maker unknown, Chinese Artist/maker unknown Chinese None
7 Vase in the form of an Archaic Bronze Vessel Artist/maker unknown, Chinese Artist/maker unknown Chinese None
8 Wall Vase (P'ing) Artist/maker unknown, Chinese Artist/maker unknown Chinese None
9 Vase (P'ing) Artist/maker unknown, Chinese Artist/maker unknown Chinese None