Simple Bubble Chart using plotly.express
Interactive data visualization
- Set Up Environment
- Previous Lesson: Introduction to Data Story Telling
- Next Lesson: Coming soon...
- Additional information
- References:
As mentioned in the instructions, all materials can be open in Colab
as Jupyter notebooks. In this way users can run the code in the cloud. It is highly recommanded to follow the tutorials in the right order.
Plotly's Python graphing library makes interactive, publication-quality graphs. Compared to the static graphics, Plotly can not only embed more information using the hover tools, it also allow zooming and panning so users can easily look into the details of the graphics. Users can create Plotly graphics in Python using either plotly.graph_objects or plotly.express. While the first option allows more customization for graphical elements, The later option allows a much simpler syntax so it is also easier to learn.
This notebook aims to demonstrate users some simple functionalities of plotly.express library for making simple but interactive bubble charts. To start with the tutorial, users need to have some knowledge of matplotlib and understand the basic grammer of making a plot. In this tutorial, we will use a simple example of some basic statistics of Chinese dynasties, including the year range, territory area, population and maximum longevity of emperors. The statistics are found online with no guarantee of preciseness. It is purely used for plotting demonstration purpose.
Presumptions:
import plotly.express as px
import numpy as np
import pandas as pd
Then, we will use a dictionary to create a Pandas data frame dyn_df
.
dyn = {
"dynasty": ["Qin","Han","Jin","Sui","Md Tang","S Song","Yuan","Ming","Qing"], # name of dynasty
"year": np.array([-221,2,280,581,726,1223,1341,1570,1887]), # year of begin
"area": 10000*np.array([360,609,543,467,1237,200,1372,997,1316]), # territory
"pop": 10000*np.array([4500,6500,2200,4450,8050,8060,8500,6000,37700]), # population
"max emperor longevity": np.array([50,70,55,64,82,81,79,71,89]) # maximum emperor longevity
}
dyn_df = pd.DataFrame(data=dyn) # create data frame
Then we look at the first few rows.
dyn_df.head()
Using plotly.express (px) allows the uses of shorter code to make a plot. To create a scatter plot, we only need to input the data frame dyn_df
, the columns (variables) for x, y axis, size of the markers, hover names (which information will appear when you point your mouse to the markers) and color of the markers. The argument size_max
define the relative size of the bubbles. Typing fig.show()
allow the figure to be displayed.
As you can see, a legend will be automatically created on the right side of our plot, and population in the y axis will be displayed in the format of million for better visuals.
fig = px.scatter(dyn_df, x="year", y="pop", size="area", color="dynasty", hover_name="dynasty", size_max=60)
fig.show()
The plot looks fine, but there are still many parameters which can be adjusted. First, we want to try out a different theme (plotly_dark) for our figure. Then, we want to create a title, change some colors in layout, and color the bubbles by max emperor longevity.
Also, it would be nice if we can view the name of dynasty on the bubbles instead. Finally, to make the different between dynasties in y axis more distinct, we can use a log scale for the y axis.
To do all those, all we need to do is to customize many parameters of the scatter fucntion and use update_layout
. Remember the hierarchy of the parameters is important as we need to, for example, put parameters that belong to xaxis into a single dict.
fig = px.scatter(dyn_df, x="year", y="pop",
size="area", color="max emperor longevity",opacity=0.85, # changing the color parameter
hover_name="dynasty", size_max=70,
template="plotly_dark", title="Chinese Dynasty Comparison", # use template option
width=1000, height=450, # set size of our plot
text=dyn_df.dynasty.values
)
fig.update_layout(
title='<b>Chinese Dynasty Comparison</b>', # add a title
xaxis=dict(
title='Year (CE)', # x label
gridcolor='rgba(255, 255, 255, 0.6)', # color for grid
gridwidth=0.5, # width of grid line
),
yaxis=dict(
title='Population Size', # y label
type="log",
gridcolor='rgba(255, 255, 255, 0.6)',
gridwidth=0.5
),
font=dict(
family="Courier New, monospace", # font style
size=18, # font size
color="white" # font color
)
)
fig.show() # display plot
However, we can also choose to reduce our plot to one dimension so it looks more like a original time line.
dyn_df.head()
To do this, the main thing we need to change is the y axis. We can, for example, add a new column and set it to 0 for all rows.
dyn_df['y'] = 0
Now, we make the new plot again. This time, we want to show more hover info to the readers. We can do it in plotly.express by add custom_data
(input columns) and call them using hovertemplate
. For example, %{customdata[0]} means the first custom_data which is "dynasty" in this case (if there is only one custom_data, you can write %{customdata} instead).
Pay attention to the difference between customdata and custom_data. Be careful that it will not work the same way in plotly.graph_objects.
fig = px.scatter(dyn_df, x="year", y="y", # x and y marker location, use column name inside the dataframe
size="area",opacity=0.85, # marker transparancy
hover_name="dynasty", size_max=40, # marker size
custom_data=['dynasty', 'area'], # extra hover info
template="simple_white", title="Chinese Territory over Time", # template and plot title
width=900, height=300, # plot size
text=dyn_df.dynasty.values # the dynasty will be displayed on the bubbles
)
fig.update_traces(
hovertemplate='<i>Dynasty</i>: %{customdata[0]}'+ '<br>' + # what we want to show in labels: you can use HTML tag here. Eg. <br> means next line
'<i>Territory</i>: %{customdata[1]} km²' # <i> means in italic
)
fig.update_traces(
marker=dict(
color='#0073AE', # change marker color
opacity=0.5,
line=dict(
color='gray',
width=1)
)
)
fig.update_layout(showlegend=False) # we have our text so we can turn off the legend
fig.update_yaxes(visible=False) # no y axis needed
fig.update_layout(yaxis_range=[0,0]) # limit y axis range
fig.update_layout(hovermode="x") # how the plot react with hover
# the style of hover labels can be customized too
fig.update_layout(
hoverlabel=dict(
bgcolor="white", # background color
font_size=14, # font size
font_family="Rockwell" # font
)
)
fig.show()
Introduction to Data Story Telling
Previous Lesson:Next Lesson: Coming soon...
Additional information
This notebook is provided for educational purpose and feel free to report any issue on GitHub.
Author: Ka Hei, Chow
License: The code in this notebook is licensed under the Creative Commons by Attribution 4.0 license.
Last modified: December 2021