As mentioned in the instructions, all materials can be open in Colab as Jupyter notebooks. In this way users can run the code in the cloud. It is highly recommanded to follow the tutorials in the right order.


Background

Although programming is probably not your primary profession, it is always nice to have good coding practice to avoid unnecessary mistakes and misunderstandings in your projects. Some of the practices are more relevant for professional programmers, but here we mention the more relevant one for users who mostly code for fixed-term research projects.


Presumption: Not applicable



1) Documentation

RECORD WHAT YOU HAVE DONE πŸ“ƒ

Documentation is important. Although you might be really sure what you are doing when you code, most of the time it is not anymore the case when you wake up another morning, after busy time working on other projects or after a nice family trip. Do not need to mentioned if you are looking at your own code from last year. Do always document them, either by writing a comment with #, """ """, or use a Jupyter Notebook and add more texts!

And sometimes you might have a project together with your colleagues, and they also need to understand what you are doing. Or if you are always working alone, then one day you colleagues want to get some insights from the code you wrote. You are not gonna show them and explain everything to them.

So make use of comments! When we write comments, we do not write what is done, we write what is the purpose. For example, if you create a new dataframe with two columns from the old dataframe, it is better to write "extract coordinates and name from historical sites" than "create new dataframe", bacause the later comment do not give enough context and new information to the readers.

2) Consistency

KEEP THINGS EASY TO REMEMBER 🧠

You might not realize it but consistency fits your brain that always search for patterns! If you write your code in a consistent way, it can save you a lot of time running into unnecessary mistakes.

For example, if you name your five dataframes as

  • dataframe1
  • dataframe2
  • DataFrame3
  • Dataframe4
  • Dataframe_5

You might forget how did you name your dataframe later, and write dataframe3, or Dataframe_4, which all will run into errors. Consistency applies particularly to capital letters because we have learnt Python is case-sensitive.

You might think you can check it out everytime but it is not handy if you have tens of those variables. It is better that you name them in a consistent way from the first place.

Like:

  • df1
  • df2
  • df3
  • df4
  • df5

It also apply when you are writing code instead of naming variable. Because you always have multiple ways to get things done. Stick with the way that is simple and straight-forward!

For example, you have data from three years:

  • df_2018 = [9.8, 12.6, 15.8]
  • df_2019 = [5.6, 17.6, 25.1]
  • df_2020 = np.array([6.5, 11.3, 13.5])


While df_2018 and df_2019 are lists, df_2020 is a Numpy array. Although both data types will work, they are different types and you might not remember it later, and run into errors because you thought the operations that works for df_2018 will also works for df_2020.

So stay consistent!

3) Naming

KEEP THINGS CLEAN

Also it is important how you name the objects. If you have one dataframe for China, one for Korea and one for Japan. You want to name them in a more informative way, not

  • df1
  • df2
  • df3

but

  • df_china
  • df_japan
  • df_korea


There are different conventions too you would follow, particularly when it comes to combining words. Because Python do not support spaces in variable, you cannot name a data frame "South Korea Data Frame", but the letters need to be combined in some ways.


🐫 Camel Case

Camel case combines words by capitalizing all words following the first word and removing the space, as follows:

Raw: user login count

Camel Case: userLoginCount


πŸ‘¨ Pascal Case

Pascal case combines words by capitalizing all words (even the first word) and removing the space, as follows:

Raw: user login count

Pascal Case: UserLoginCount


🐍 Snake Case

Snake case combines words by replacing each space with an underscore (_) and, in the all caps version, all letters are capitalized, as follows:

Raw: user login count

Snake Case: user_login_count

Snake Case (All Caps): USER_LOGIN_COUNT


πŸ₯™ Kebab Case

Kebab case combines words by replacing each space with a dash (-), as follows:

Raw: user login count

Kebab Case: user-login-count


There is not the best way how to name your variable, either "dateFrame" or "data-frame", the more important is that they are used in a consistent way. Also pay attention that abbreviations are recommanded to keep things short but not too much that all others cannot really understand.

4) Correct Broken Code

GET THINGS FIXED πŸ”§


Sometimes things do not work the way we want, they are called bugs. They might not come to your immediate concerns but mark them down and get them fixed as soon as possible. You might even forget in what ways it is broken if you do not fix them timely and it can get worse.

5) Readability

KEEP THINGS CLEAN, CONCISE AND SIMPLE πŸ‡

This is simple. If code does not work, throw them away and do not keep mess.

For example, instead of writing code like below:

urban_history = ["Hong Kong Government Reports Online 1841-1942","Policing the Shanghai International Settlement, 1894-1945","Virtual Cities Project"] 
social_history = ["China Families"]
# Manchukuo = ["Chinese Posters in Harvard-Yenching Manchukuo Collection"] # try without first



# urban_history.insert(social_history,2) (does not work)
#
# print(urban_history.append(social_history)) (does not work either)
urban_history + social_history # this is fine
['Hong Kong Government Reports Online 1841-1942',
 'Policing the Shanghai International Settlement, 1894-1945',
 'Virtual Cities Project',
 'China Families']

Keep your "Working Table" clean like this:

urban_history = ["Hong Kong Government Reports Online 1841-1942","Policing the Shanghai International Settlement, 1894-1945","Virtual Cities Project"] 
social_history = ["China Families"]

# calculation
urban_history + social_history
['Hong Kong Government Reports Online 1841-1942',
 'Policing the Shanghai International Settlement, 1894-1945',
 'Virtual Cities Project',
 'China Families']

Clean code also mean that your code is readable. For example, try to avoid many operations in one line. Although it also works, it make things hard to read. If there are any small typos, it is often not so easy to spot.

Like:

(NOT recommanded)

import pandas as pd

(pd.DataFrame(data={'Name': ["Hong Kong Government Reports Online 1841-1942","Policing the Shanghai International Settlement, 1894-1945","Virtual Cities Project"] 
, 'History': ["urban","urban","urban"]})).head(1)
Name History
0 Hong Kong Government Reports Online 1841-1942 urban
import pandas as pd

(pd.DataFrame(data={'Name': ["Hong Kong Government Reports Online 1841-1942","Policing the Shanghai International Settlement, 1894-1945","Virtual Cities Project"] 
, 'History': ["urban","urban","urban"]]})).head(1)
  File "<ipython-input-10-5139c64437bf>", line 5
    , 'History': ["urban","urban","urban"]]})).head(1)
                                          ^
SyntaxError: invalid syntax

(recommanded)

import pandas as pd

# type of history
hist_type = ["urban","urban","urban"]

# set up datframe
d = {'Name': urban_history, 'History': hist_type}
df = pd.DataFrame(data=d)
df.head(1)
Name History
0 Hong Kong Government Reports Online 1841-1942 urban

Happy Coding!



(Quoted from The Zen of Python, by Tim Peters)

Beautiful is better than ugly.

Explicit is better than implicit.

Simple is better than complex.

Complex is better than complicated.

Readability counts.

If the implementation is hard to explain, it’s a bad idea.



Previous Lesson: Python Introduction Basics

Next Lesson: Debugging and Understanding Errors Basics


Additional information

This notebook is provided for educational purpose and feel free to report any issue on GitHub.


Author: Ka Hei, Chow

License: The code in this notebook is licensed under the Creative Commons by Attribution 4.0 license.

Last modified: December 2021




References:

https://data-flair.training/blogs/python-best-practices/

https://betterprogramming.pub/string-case-styles-camel-pascal-snake-and-kebab-case-981407998841