Paper Information

Citation: Zhang, Y., Shi, G., & Wang, K. (2010). A SVM-HMM Based Online Classifier for Handwritten Chemical Symbols. 2010 International Conference on Pattern Recognition, 1888–1891. https://doi.org/10.1109/ICPR.2010.465

Publication: ICPR 2010

What kind of paper is this?

Method. This paper is a methodological contribution that proposes a novel “double-stage classifier” architecture. It fits the taxonomy by introducing a specific algorithmic pipeline (SVM rough classification followed by HMM fine classification) and a novel pre-processing algorithm (Point Sequence Reordering) to solve technical limitations in recognizing organic ring structures. The contribution is validated through ablation studies (comparing SVM kernels and HMM state/Gaussian counts) and performance benchmarks.

What is the motivation?

The primary motivation is the complexity of recognizing handwritten chemical symbols, specifically the distinction between Organic Ring Structures (ORS) and Non-Ring Structures (NRS). Existing single-stage classifiers are unreliable for ORS because these symbols have arbitrary writing styles, variable stroke numbers, and inconsistent stroke orders due to their 2D hexagonal structure. A robust system is needed to handle this uncertainty and achieve high accuracy.

What is the novelty here?

The authors introduce two main novelties:

  1. Double-Stage Architecture: A hybrid system where an SVM (using RBF kernel) first roughly classifies inputs as either ORS or NRS, followed by specialized HMMs for fine-grained recognition.
  2. Point Sequence Reordering (PSR) Algorithm: A stroke-order independent algorithm designed specifically for ORS. It reorders the point sequence of a symbol based on a counter-clockwise scan from the centroid, effectively eliminating the uncertainty caused by variations in stroke number and writing order.

What experiments were performed?

The authors collected a custom dataset and performed sequential optimizations:

  • SVM Optimization: Compared Polynomial, RBF, and Sigmoid kernels to find the best rough classifier.
  • HMM Optimization: Tested multiple combinations of states (4, 6, 8) and Gaussians (3, 4, 6, 8, 9, 12) to maximize fine classification accuracy.
  • PSR Validation: Conducted an ablation study comparing HMM accuracy on ORS symbols “Before PSR” vs “After PSR” to quantify the algorithm’s impact.

What were the outcomes and conclusions drawn?

  • Architecture Performance: The RBF-based SVM achieved 99.88% accuracy in differentiating ORS from NRS.
  • HMM Configuration: The optimal HMM topology was found to be 8-states and 12-Gaussians for both symbol types.
  • PSR Impact: The PSR algorithm drastically improved ORS recognition. Top-1 accuracy jumped from 49.84% (Before PSR) to 98.36% (After PSR).
  • Overall Accuracy: The final integrated system achieved a Top-1 accuracy of 93.10% and Top-3 accuracy of 98.08% on the test set.

Reproducibility Details

Data

The study defined 101 chemical symbols split into two categories.

CategoryCountContentNotes
NRS (Non-Ring)63Digits 0-9, 44 letters, 9 operatorsOperators include +, -, =, $\rightarrow$, etc.
ORS (Organic Ring)382D hexagonal structuresBenzene rings, cyclohexane, etc.
  • Collection: 12,322 total samples (122 per symbol) collected from 20 writers (teachers and students).
  • Split: 9,090 training samples and 3,232 test samples.
  • Constraints: Three specifications were used: normal, standard, and freestyle.

Algorithms

1. SVM Feature Extraction (Rough Classification) The input strokes are scaled, and a 58-dimensional feature vector is calculated:

  • Mesh ($4 \times 4$): Ratio of points in 16 grids (16 features).
  • Outline: Normalized scan distance from 4 edges with 5 scan lines each (20 features).
  • Projection: Point density in 5 bins per edge (20 features).
  • Aspect Ratio: Height/Width ratios (2 features).

2. Point Sequence Reordering (PSR) Used strictly for ORS preprocessing:

  1. Calculate the centroid of the symbol.
  2. Initialize a scan line at angle $\theta = 0$.
  3. Traverse points; if a point is within a threshold distance of the scan line, add to reordered list.
  4. Increment $\theta$ by $\Delta\theta$ and repeat until a full circle ($2\pi$) is completed.

Models

  • SVM (Stage 1): RBF Kernel was selected as optimal with parameters $C=512$ and $\gamma=0.5$.
  • HMM (Stage 2): Left-right continuous HMM trained via Baum-Welch algorithm. The topology is one model per symbol using 8 states and 12 Gaussians.

Evaluation

Metrics reported are Top-1, Top-2, and Top-3 accuracy on the held-out test set.

MetricNRS AccuracyORS AccuracyOverall Test Accuracy
Top-191.91%97.53%93.10%
Top-399.12%99.34%98.08%

Hardware

  • Device: HP Pavilion tx1000 Tablet PC.
  • Processor: 2.00GHz CPU.

Citation

@inproceedings{zhang2010svm,
  title={A SVM-HMM Based Online Classifier for Handwritten Chemical Symbols},
  author={Zhang, Yang and Shi, Guangshun and Wang, Kai},
  booktitle={2010 International Conference on Pattern Recognition},
  pages={1888--1891},
  year={2010},
  organization={IEEE},
  doi={10.1109/ICPR.2010.465}
}