Hi! This site focuses on Python and aims to support users to conduct academic research in the humanities using open source programming languages. This is created at the end of 2021 by Ka Hei who is a Hongkonger passionated about machine learning and data science.

The establishment of this site aims to improve accessibility of emerging digital Chinese history resources to programming and data science novels who wish to integrate digital tools with historical research. Nonetheless, most of the content and example also fit for all kinds of qualitative humanities data.

It covers tasks from data acquisition, analysis to visualization, working with both text📜 and geospatial data🗺️. No programming knowledge is required to start with the tutorials. All tutorials use (historical) Chinese text to demonstrate workflows of relevant tasks.

Before start, please read the instructions.

Chapter 1: Python Programming Basics

Are you wondering how to start with Python? The following tutorials prepared you with the basics required for further data analysis and visualization.

Show Lessons

  1. Python for Historical Research
  2. Introduction to Jupyter & Colab
  3. Introduction to Python Programming (Example from 孟浩然诗全集)
  4. Good Coding Practice
  5. Debugging and Understanding Errors
  6. Functions and Loops
  7. List Comprehension (Example from 中国妇女杂志)

Chapter 2: Data Organization

This chapter covers re and Pandas, the useful libraries for preprocessing and cleaning your data, as well as some simple analysis and plotting using matplotlib.

Show Lessons

  1. Regular Expression (Example from 清代档案)
  2. Python Pandas Library
  3. Pandas Numerical Operation (Example from UNESCO)

Chapter 3: Data Visualization

This chapter covers Bokeh and Plotly, which allow you to create some interactive figures for presentation.

Show Lessons

  1. Introduction to Data Visualization
  2. plotly.express: Creating Simple Bubble Chart
  3. plotly.express: continue with Bubble Timeline
  4. Colored Stripes
  5. plotly.express: Gantt Charts and Timelines
  6. Circular Packing Chart using circlify (Example from 宋朝姓氏分布)
  7. plotly.graph_objects: Interactive Polar Bar Chart
  8. From PDF to Word Cloud (Example from 杯酒释兵权考)
(Coming soon ...)

Chapter 4: Text Analysis

Are you wondering what is NLP and how can you apply them to the digital Chinese texts? Here you will learn some simple concepts and implementations in Python using spaCy, pytesseract, jieba and BeautifulSoup4.

Show Lessons

  1. Webscrapping using BeautifulSoup4 (Example from 孟浩然诗全集)
  2. Continue with Webscrapping
  3. OCR with Chinese Text
  4. spaCy NLP Introduction
  5. Continue with NLP
  6. TF-IDF
  7. Textual Similarity
  8. Sentiment Analysis
(Coming soon ...)

Chapter 5: Network Analysis

Here some basic concepts for network analysis are covered using NetworkX, PyVis and Plotly.

Show Lessons

  1. NetworkX Introduction (Example from 中国历代进士资料库)
  2. Plotly with NetworkX
  3. PyVis
(Coming soon ...)

Chapter 6: Geospatial Map

In chapter 6, we begin to work with spatial data which is essential for creating map. Apart from GIS, geospatial analysis and visualization can also be performed in Python. Libraries geopandas and folium are covered.

Show Lessons

  1. Geocoding Chinese Place Names
  2. Introduction to Vectors
  3. Introduction to Geopandas: Analysing Geometry Objects
  4. Introduction to Folium: Starting with Maps (Example from CHGIS)
  5. Choropleth Map (Example from CHGIS)
(Coming soon ...)

Chapter 7: Web-based Tools

In order to share your interactive presentation, a web application can be useful. Here Github Pages and dash is covered to show the workflow for creating a simple webmap.

Show Lessons

  1. GitHub pages
  2. Hosting Webmaps
(Coming soon ...)

Chapter 8: Machine Learning

Machine learning has become a popular topic also for text analysis. Here some simple concepts are discussed, for example, its applications on topic modelling and text document clustering.

Show Lessons

(Coming soon ...)


If you like my content, feel free to follow and endorse me on Twitter and GitHub.


Posts