As mentioned in the instructions, all materials can be open in Colab as Jupyter notebooks. In this way users can run the code in the cloud. It is highly recommanded to follow the tutorials in the right order.

plotly.express

Hi there! In the last tutorial, we began to explore the potential of plotly.express, which is a wrapper for Plotly.py to allow more interaction in our graphics. Last time we made a simple scatter plot/ bubble chart. This time we will continue with a variation of bubble chart to represent temporal development of the UNESCO inscriptions in different countries. This timeline is characterised by the bubbles along the x-axis with varied sizes and can be used to contrast temporal trends of multiply categories.

In order to create the timeline, we first have to import needed libraries, and read the data into a pandas data frame.

import io
import pandas as pd
import requests

# read data
url = 'https://examples.opendatasoft.com/explore/dataset/world-heritage-unesco-list/download/?format=csv&timezone=Europe/Berlin&lang=en&use_labels_for_header=true&csv_separator=%3B'

df = pd.read_csv(url, sep=";")

df.head()

Data Cleaning

We need to start with some preprocessing and data cleaning. We will start with subsetting and renaming the columns, followed by a calculation of total UNESCO sites in the "top 10 countries" using groupby() (we do not need this data frame for the plot, only list of the top 10 countries). We will sort the values using sort_vales(by=['name']) to order the countries from the most to the least UNESCO sites.

df = df[["Name (EN)","Date inscribed","Category","Country (EN)","Continent (EN)"]] # select multiple columns in a list []
df = df.rename(columns={"Name (EN)": "name", "Date inscribed": "date", "Category": "type", "Country (EN)": "country", "Continent (EN)": "continent"}) # rename the columns for easy reading

top_10 = df.groupby(df["country"]).count().sort_values(by=['name'], ascending=False).head(10)
top_10

Get the top 10 countries as a numpy array.

sub_cnty = top_10.index.values
sub_cnty

array(['China', 'Italy', 'Spain', 'France', 'Germany', 'Mexico', 'India',
       'United Kingdom of Great Britain and Northern Ireland',
       'Russian Federation', 'Iran (Islamic Republic of)'], dtype=object)

With the information of the top 10 countries, we can now delete all the rows from other countries using isin(sub_cnty). We will then group the rows by country and date and count the rows for every country and every year. We will then reset the index.

top_df = df[df['country'].isin(sub_cnty)].groupby(['country','date']).count()['name'].reset_index()
top_df.head(5)

As we need only the information of year, not the full date, we will create a new column year. We can extrate the year by first interpreting the date column as date time, then take the year values (simply with .year).

top_df['year'] = pd.DatetimeIndex(top_df['date']).year # set up a new year column

top_df.head()

Now we will group by again with the country and year and get the sum (count of inscriptions every year).

group_df = top_df.groupby(["country","year"]).sum()

import numpy as np
country_list = np.array(group_df.index.get_level_values(0))
year_list = np.array(group_df.index.get_level_values(1))

As we want need the country and year column not only as index. We will assign the columns again.

group_df['country'] = country_list
group_df['year'] = year_list

Renaming the name column to count.

group_df = group_df.rename(columns={"name": "count"})
group_df.head()

To improve the visuals, we will simplified the name of UK.

group_df['country'] = group_df['country'].str.replace('United Kingdom of Great Britain and Northern Ireland','United Kingdom')

Data Visualization

To make a interactive scatter plot in plotly.express, we only need to use px.scatter(). It is highly compatible with pandas, so we can input a pandas data frame, and specify x and y (as well as size and color which are optional) with the column names.

Every changes in layout we can change using update_layout(). All the options can be found here.

import plotly.express as px
fig = px.scatter(group_df, x="year", y="country", size="count", color="country")
fig.update_layout(showlegend=False)
fig.show()

Almost Done!

Good job! Let's look at our plot. It is interactive so you can pan around and zoom in/ out. If you put your mouse on the bubbles, you will also get information such as the country name and counts at a specific year. It is the default Plotly option.

However, we can also gain control over what information we want to put in the hover labels, as well as the layout (like the font, fontsize and so on). Isn't it much cooler if we can show names of all UNESCO sites instead of the count?!

Also, we can control to display hover labels for the whole xaxis instead of an individual bubble, which means, we can display all UNESCO sites inscripted in a year! Let's say we also want to display a moving yaxis too.

Let's do all the adjustments mentioned above.

Customization

df.head()

Adjust Data Frame

As we need the information about UNESCO site name this time, we need to make use of df to make a subset for the top 10 countries then merged with our group_df. Let's go back to df and do some cleaning. First, we add the year column for df too. We group by country and year, and do a transformation here.

It is a bit tricky. The transformation aims to get all the rows with same country and year, and join all the values from ['name'] separated with a comma (,). This transformation is only done to the top 10 countries df[df['country'].isin(sub_cnty)]. As this is repeatedly done for every row, we will end up with rows that are duplicated, so we will remove them.

df['year'] = pd.DatetimeIndex(df['date']).year

# join the site names
df['site'] = df[df['country'].isin(sub_cnty)].groupby(['country','year'])['name'].transform(lambda x: ', '.join(x))

# remove duplicates
df.drop_duplicates()

# look at the rows for China
df[df["country"] == "China"].head(5)

Make sure only top 10 countries are included.

df_sub = df[df['country'].isin(sub_cnty)]
df_sub.head(1)

group_df.head(1)

group_df.reset_index(drop=True, inplace=True)

Now, we have the name information from df_sub. We can merge it to our group_df data frame using the keys "country" and "year". We select only the relevant columns [["country","year","site","count"]], and call the new data frame final.

final = df_sub.merge(group_df, left_on=["country","year"], right_on=["country","year"])

final = final[["country","year","site","count"]]
final.head()

Great! Almost everything is ready. We only need to replace the comma with a <br> to make sure every item will be put in a new line in the hover labels.

final.site = final.site.apply(lambda x: x.replace(', ', '<br>'))
final.site.head()

0    Rock Paintings of the Sierra de San Francisco<...
1    Rock Paintings of the Sierra de San Francisco<...
2    Rock Paintings of the Sierra de San Francisco<...
3    Mount Etna<br>Medici Villas and Gardens in Tus...
4    Mount Etna<br>Medici Villas and Gardens in Tus...
Name: site, dtype: object

Ploting

Now, let's do our plot again using px.scatter().

fig = px.scatter(final, x="year", y="country", size="count", color="country",
                 custom_data=['year', 'site'])
# remove legend
fig.update_layout(showlegend=False)

# show labels for whole x axis
fig.update_layout(hovermode='x')

# change layout for hover labels
fig.update_layout(
    hoverlabel=dict(
        bgcolor="white",
        font_size=12,
        font_family="Rockwell"
    )
)

# control info for hover labels using custom_data we specified above in pxscatter()
# join items with new line <br>
fig.update_traces(
    hovertemplate="<br>".join([
        "%{y}",
        "Site: %{customdata[1]}"
    ])
)

# add title, x- and y- labels, and a moving line along x axis
# change font styles for the texts inside plot (y ticks and so on)
fig.update_layout(
    title={
        'text': "Timeline of UNESCO Inscriptions",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title="Year of Inscription",
    yaxis_title="Top 10 Countries",
    xaxis={'showspikes': True,
        'spikemode': 'across',
        'spikesnap': 'cursor',
        'showline': True,
        'showgrid': True},
    font=dict(
        family="Rockwell",
        size=15,
        color="black"
    )
)

# display out plot
fig.show()

Cool! That's it!

Now we have an interactive plot with enhanced visuals and all information we need in the labels. Not only can we clearly see the trends of inscriptions in different countries, we can also clearly see the "inscription peak" of some countries (such as 1997 in Italy). We can tell, for example, countries like Russia and China are late players in the field.

Previous Lesson: Simple Bubble Chart

Next Lesson: Coming soon...

Additional information

This notebook is provided for educational purpose and feel free to report any issue on GitHub.

Author: Ka Hei, Chow

License: The code in this notebook is licensed under the Creative Commons by Attribution 4.0 license.

Last modified: December 2021

References:

Plotly

	Name (EN)	Name (FR)	Short description (EN)	Short Description (FR)	Justification (EN)	Justification (FR)	Date inscribed	Danger list	Longitude	Latitude	Area hectares	Category	Country (EN)	Country (FR)	Continent (EN)	Continent (FR)	Geographical coordinates
0	Architectural, Residential and Cultural Comple...	Ensemble architectural, résidentiel et culture...	The Architectural, Residential and Cultural Co...	L’ensemble architectural, résidentiel et cultu...	Criterion (ii): The architectural, residential...	Critère (ii) : L’ensemble architectural, résid...	2005-01-01	NaN	26.691390	53.222780	0.00	Cultural	Belarus	Bélarus	Europe and North America	Europe et Amérique du nord	53.22278,26.69139
1	Rock Paintings of the Sierra de San Francisco	Peintures rupestres de la Sierra de San Francisco	From c. 100 B.C. to A.D. 1300, the Sierra de S...	Dans la réserve d'El Vizcaíno, en Basse-Califo...	NaN	NaN	1993-01-01	NaN	-112.916110	27.655560	182600.00	Cultural	Mexico	Mexique	Latin America and the Caribbean	Amérique latine et Caraïbes	27.65556,-112.91611
2	Monastery of Horezu	Monastère de Horezu	Founded in 1690 by Prince Constantine Brancova...	Fondé en 1690 par le prince Constantin Brancov...	NaN	NaN	1993-01-01	NaN	24.016667	45.183333	22.48	Cultural	Romania	Roumanie	Europe and North America	Europe et Amérique du nord	45.18333333,24.01666667
3	Mount Etna	Mont Etna	Mount Etna is an iconic site encompassing 19,2...	Ce site emblématique recouvre une zone inhabit...	NaN	NaN	2013-01-01	NaN	14.996667	37.756111	19237.00	Natural	Italy	Italie	Europe and North America	Europe et Amérique du nord	37.7561111111,14.9966666667
4	Belfries of Belgium and France	Beffrois de Belgique et de France	Twenty-three belfries in the north of France a...	Vingt-trois beffrois, situés dans le nord de l...	NaN	NaN	1999-01-01	NaN	3.231390	50.174440	0.00	Cultural	Belgium,France	Belgique,France	Europe and North America	Europe et Amérique du nord	50.17444,3.23139

	name	date	type	continent
country
China	49	49	49	49
Italy	47	47	47	47
Spain	41	41	41	41
France	38	38	38	38
Germany	35	35	35	35
Mexico	34	34	34	34
India	33	33	33	33
United Kingdom of Great Britain and Northern Ireland	27	27	27	27
Russian Federation	21	21	21	21
Iran (Islamic Republic of)	21	21	21	21

	country	date	name	year
0	China	1987-01-01	6	1987
1	China	1990-01-01	1	1990
2	China	1992-01-01	3	1992
3	China	1994-01-01	4	1994
4	China	1996-01-01	2	1996

	name	date	type	country	continent
0	Architectural, Residential and Cultural Comple...	2005-01-01	Cultural	Belarus	Europe and North America
1	Rock Paintings of the Sierra de San Francisco	1993-01-01	Cultural	Mexico	Latin America and the Caribbean
2	Monastery of Horezu	1993-01-01	Cultural	Romania	Europe and North America
3	Mount Etna	2013-01-01	Natural	Italy	Europe and North America
4	Belfries of Belgium and France	1999-01-01	Cultural	Belgium,France	Europe and North America

	name	date	type	country	continent	year	site
5	Sichuan Giant Panda Sanctuaries - Wolong, Mt S...	2006-01-01	Natural	China	Asia and the Pacific	2006	Sichuan Giant Panda Sanctuaries - Wolong, Mt S...
32	Tusi Sites	2015-01-01	Cultural	China	Asia and the Pacific	2015	Tusi Sites
38	The Great Wall	1987-01-01	Cultural	China	Asia and the Pacific	1987	The Great Wall, Mausoleum of the First Qin Emp...
68	Mausoleum of the First Qin Emperor	1987-01-01	Cultural	China	Asia and the Pacific	1987	The Great Wall, Mausoleum of the First Qin Emp...
72	Chengjiang Fossil Site	2012-01-01	Natural	China	Asia and the Pacific	2012	Chengjiang Fossil Site, Site of Xanadu

	country	year	site	count
0	Mexico	1993	Rock Paintings of the Sierra de San Francisco,...	3
1	Mexico	1993	Rock Paintings of the Sierra de San Francisco,...	3
2	Mexico	1993	Rock Paintings of the Sierra de San Francisco,...	3
3	Italy	2013	Mount Etna, Medici Villas and Gardens in Tuscany	2
4	Italy	2013	Mount Etna, Medici Villas and Gardens in Tuscany	2