Paper Information
Title: HMM-Based Online Recognition of Handwritten Chemical Symbols
Authors: Yang Zhang, Guangshun Shi, Jufeng Yang
Publication: ICDAR 2009
What kind of paper is this?
This is a Method paper that proposes a specific algorithmic pipeline for the online recognition of handwritten chemical symbols. The core contribution is the engineering of an 11-dimensional feature vector combined with a Hidden Markov Model (HMM) architecture. The paper validates this method through quantitative experiments on a custom dataset, focusing on recognition accuracy as the primary metric.
What is the motivation?
Recognizing chemical symbols is uniquely challenging due to the complex structure of chemical expressions and the nature of pen-based input, which often results in broken or conglutinate strokes. Additionally, variations in writing style and random noise make the task difficult. While online recognition for Western characters and CJK scripts is well-developed, works specifically targeting online chemical symbol recognition are scarce, with most prior research focusing on offline recognition or global optimization.
What is the novelty here?
The primary novelty is the application of continuous HMMs specifically to the domain of online chemical symbol recognition, utilizing a specialized set of 11-dimensional local features. While HMMs have been used for other scripts, this paper tailors the feature extraction (including curliness, linearity, and writing direction) to capture the specific geometric properties of chemical symbols.
What experiments were performed?
The authors constructed a specific dataset for this task involving 20 participants (college teachers and students).
- Dataset: 64 distinct symbols (digits, English letters, Greek letters, operators)
- Volume: 7,808 total samples (122 per symbol), split into 5,670 training samples and 2,016 testing samples
- Model Sweeps: They evaluated the HMM performance by varying the number of states (4, 6, 8) and the number of Gaussians per state (3, 4, 6, 9, 12)
What were the outcomes and conclusions drawn?
- Performance: The best configuration (6 states, 9 Gaussians) achieved a top-1 accuracy of 89.5% and a top-3 accuracy of 98.7%
- Scaling: Results showed that generally, increasing the number of states and Gaussians improved accuracy, though at the cost of computational efficiency
- Error Analysis: The primary sources of error were shape similarities between specific characters (e.g., ‘0’ vs ‘O’ vs ‘o’, and ‘C’ vs ‘c’ vs ‘(’)
Reproducibility Details
Data
The study utilized a custom dataset collected in a laboratory environment.
| Purpose | Dataset | Size | Notes |
|---|---|---|---|
| Training | Custom Chemical Symbol Set | 5,670 samples | 90 samples per symbol |
| Testing | Custom Chemical Symbol Set | 2,016 samples | 32 samples per symbol |
Dataset Composition: The set includes 64 symbols: Digits (0-9), Uppercase (A-Z, missing Q), Lowercase (a-z, selected), Greek letters ($\alpha$, $\beta$, $\gamma$, $\pi$), and operators ($+$, $=$, $\rightarrow$, $\uparrow$, $\downarrow$, $($ , $)$).
Algorithms
1. Preprocessing
The raw tablet data undergoes a 6-step pipeline:
- Duplicate Point Elimination: Removing sequential points with identical coordinates
- Broken Stroke Connection: Using Bezier curves to interpolate missing points/connect broken strokes
- Hook Elimination: Removing artifacts at the start/end of strokes characterized by short length and sharp angle changes
- Smoothing: Reducing noise from erratic pen movement
- Re-sampling: Spacing points equidistantly to remove temporal variation
- Size Normalization: Removing variation in writing scale
2. Feature Extraction (11 Dimensions)
Features are extracted from a 5-point window centered on $t$ ($t-2$ to $t+2$). The 11 dimensions are:
- Normalized Vertical Position: $y(t)$ mapped to $[0,1]$
- Normalized First Derivative ($x’$): Calculated via weighted sum of neighbors
- Normalized First Derivative ($y’$): Calculated via weighted sum of neighbors
- Normalized Second Derivative ($x’’$): Computed using $x’$ values
- Normalized Second Derivative ($y’’$): Computed using $y’$ values
- Curvature: $\frac{x’y’’ - x’‘y’}{(x’^2 + y’^2)^{3/2}}$
- Writing Direction (Cos): $\cos \alpha(t)$ based on vector from $t-1$ to $t+1$
- Writing Direction (Sin): $\sin \alpha(t)$
- Aspect Ratio: Ratio of height to width in the 5-point window
- Curliness: Deviation from the straight line connecting the first and last point of the window
- Linearity: Average squared distance of points in the window to the straight line connecting start/end points
3. Feature Normalization
The final feature matrix $V$ is normalized to zero mean and unit standard deviation using the covariance matrix: $o_t = \Sigma^{-1/2}(v_t - \mu)$.
Models
- Architecture: Continuous Hidden Markov Models (HMM)
- Topology: Left-to-right (Bakis model)
- Initialization: Initial distribution $\pi = {1, 0, …, 0}$; uniform transition matrix $A$; segmental k-means for observation matrix $B$
- Training: Baum-Welch re-estimation
- Decision: Maximum likelihood classification ($\hat{\lambda} = \arg \max P(O|\lambda)$)
Evaluation
| Metric | Best Value | Configuration | Notes |
|---|---|---|---|
| Top-1 Accuracy | 89.5% | 6 States, 9 Gaussians | Highest reported accuracy |
| Top-3 Accuracy | 98.7% | 6 States, 9 Gaussians | Top-3 candidate accuracy |
Citation
@inproceedings{zhang2009hmm,
title={HMM-Based Online Recognition of Handwritten Chemical Symbols},
author={Zhang, Yang and Shi, Guangshun and Yang, Jufeng},
booktitle={2009 10th International Conference on Document Analysis and Recognition},
pages={1255--1259},
year={2009},
organization={IEEE},
doi={10.1109/ICDAR.2009.99}
}