As mentioned in the instructions, all materials can be open in Colab as Jupyter notebooks. In this way users can run the code in the cloud. It is highly recommanded to follow the tutorials in the right order.


Plotly's Python graphing library makes interactive, publication-quality graphs. Compared to the static graphics, Plotly can not only embed more information using the hover tools, it also allow zooming and panning so users can easily look into the details of the graphics. Users can create Plotly graphics in Python using either plotly.graph_objects or plotly.express. While the first option allows more customization for graphical elements, The later option allows a much simpler syntax so it is also easier to learn.

This notebook aims to demonstrate users some simple functionalities of plotly.express library for making simple but interactive bubble charts. To start with the tutorial, users need to have some knowledge of matplotlib and understand the basic grammer of making a plot. In this tutorial, we will use a simple example of some basic statistics of Chinese dynasties, including the year range, territory area, population and maximum longevity of emperors. The statistics are found online with no guarantee of preciseness. It is purely used for plotting demonstration purpose.


Presumptions:

Set Up Environment

First, we set up the environment by importing the libraries.

import plotly.express as px
import numpy as np
import pandas as pd

Then, we will use a dictionary to create a Pandas data frame dyn_df.

dyn = {
  "dynasty": ["Qin","Han","Jin","Sui","Md Tang","S Song","Yuan","Ming","Qing"], # name of dynasty
  "year": np.array([-221,2,280,581,726,1223,1341,1570,1887]), # year of begin
  "area": 10000*np.array([360,609,543,467,1237,200,1372,997,1316]), # territory
  "pop": 10000*np.array([4500,6500,2200,4450,8050,8060,8500,6000,37700]), # population
  "max emperor longevity": np.array([50,70,55,64,82,81,79,71,89]) # maximum emperor longevity
}

dyn_df = pd.DataFrame(data=dyn) # create data frame

Then we look at the first few rows.

dyn_df.head()
dynasty year area pop max emperor longevity
0 Qin -221 3600000 45000000 50
1 Han 2 6090000 65000000 70
2 Jin 280 5430000 22000000 55
3 Sui 581 4670000 44500000 64
4 Md Tang 726 12370000 80500000 82

Using plotly.express (px) allows the uses of shorter code to make a plot. To create a scatter plot, we only need to input the data frame dyn_df, the columns (variables) for x, y axis, size of the markers, hover names (which information will appear when you point your mouse to the markers) and color of the markers. The argument size_max define the relative size of the bubbles. Typing fig.show() allow the figure to be displayed.

As you can see, a legend will be automatically created on the right side of our plot, and population in the y axis will be displayed in the format of million for better visuals.

fig = px.scatter(dyn_df, x="year", y="pop", size="area", color="dynasty", hover_name="dynasty", size_max=60)
fig.show()

The plot looks fine, but there are still many parameters which can be adjusted. First, we want to try out a different theme (plotly_dark) for our figure. Then, we want to create a title, change some colors in layout, and color the bubbles by max emperor longevity.

Also, it would be nice if we can view the name of dynasty on the bubbles instead. Finally, to make the different between dynasties in y axis more distinct, we can use a log scale for the y axis.

To do all those, all we need to do is to customize many parameters of the scatter fucntion and use update_layout. Remember the hierarchy of the parameters is important as we need to, for example, put parameters that belong to xaxis into a single dict.

fig = px.scatter(dyn_df, x="year", y="pop", 
                 size="area", color="max emperor longevity",opacity=0.85, # changing the color parameter
                 hover_name="dynasty", size_max=70,
                 template="plotly_dark", title="Chinese Dynasty Comparison", # use template option
                 width=1000, height=450, # set size of our plot
                 text=dyn_df.dynasty.values
                 )

fig.update_layout(
    title='<b>Chinese Dynasty Comparison</b>', # add a title
    xaxis=dict(
        title='Year (CE)', # x label
        gridcolor='rgba(255, 255, 255, 0.6)', # color for grid
        gridwidth=0.5, # width of grid line
    ),
    yaxis=dict(
        title='Population Size', # y label
        type="log",
        gridcolor='rgba(255, 255, 255, 0.6)',
        gridwidth=0.5
    ),
    font=dict(
        family="Courier New, monospace", # font style
        size=18, # font size
        color="white" # font color
    )
)

fig.show() # display plot

However, we can also choose to reduce our plot to one dimension so it looks more like a original time line.

dyn_df.head()
dynasty year area pop max emperor longevity
0 Qin -221 3600000 45000000 50
1 Han 2 6090000 65000000 70
2 Jin 280 5430000 22000000 55
3 Sui 581 4670000 44500000 64
4 Md Tang 726 12370000 80500000 82

To do this, the main thing we need to change is the y axis. We can, for example, add a new column and set it to 0 for all rows.

dyn_df['y'] = 0

Now, we make the new plot again. This time, we want to show more hover info to the readers. We can do it in plotly.express by add custom_data (input columns) and call them using hovertemplate. For example, %{customdata[0]} means the first custom_data which is "dynasty" in this case (if there is only one custom_data, you can write %{customdata} instead).

Pay attention to the difference between customdata and custom_data. Be careful that it will not work the same way in plotly.graph_objects.

fig = px.scatter(dyn_df, x="year", y="y",  # x and y marker location, use column name inside the dataframe
                 size="area",opacity=0.85, # marker transparancy
                 hover_name="dynasty", size_max=40, # marker size
                 custom_data=['dynasty', 'area'], # extra hover info
                 template="simple_white", title="Chinese Territory over Time", # template and plot title
                 width=900, height=300, # plot size
                 text=dyn_df.dynasty.values # the dynasty will be displayed on the bubbles
                 )

fig.update_traces(
    hovertemplate='<i>Dynasty</i>: %{customdata[0]}'+ '<br>' + # what we want to show in labels: you can use HTML tag here. Eg. <br> means next line
                 '<i>Territory</i>: %{customdata[1]} km²' # <i> means in italic
                 )

fig.update_traces(
    marker=dict(
            color='#0073AE', # change marker color
            opacity=0.5,
            line=dict(
                color='gray',
                width=1)
            )
    )

fig.update_layout(showlegend=False) # we have our text so we can turn off the legend
fig.update_yaxes(visible=False) # no y axis needed
fig.update_layout(yaxis_range=[0,0]) # limit y axis range
fig.update_layout(hovermode="x") # how the plot react with hover

# the style of hover labels can be customized too
fig.update_layout(
    hoverlabel=dict(
        bgcolor="white", # background color
        font_size=14, # font size
        font_family="Rockwell" # font
    )
)

fig.show()

Previous Lesson: Introduction to Data Story Telling

Next Lesson: Coming soon...




Additional information

This notebook is provided for educational purpose and feel free to report any issue on GitHub.


Author: Ka Hei, Chow

License: The code in this notebook is licensed under the Creative Commons by Attribution 4.0 license.

Last modified: December 2021




References:

Plotly