Paper Information
Citation: Ramel, J.-Y., Boissier, G., & Emptoz, H. (1999). Automatic Reading of Handwritten Chemical Formulas from a Structural Representation of the Image. Proceedings of the Fifth International Conference on Document Analysis and Recognition (ICDAR ‘99), 83-86. https://doi.org/10.1109/ICDAR.1999.791730
Publication: ICDAR 1999
Contribution: Structural Approach to Document Analysis
Method. This paper proposes a system architecture for document analysis. It introduces a specific pipeline (Global Perception followed by Incremental Extraction) and validates this strategy with recognition rates on specific tasks. The core contribution is the shift from bitmap-based processing to a structural graph representation of graphical primitives.
Motivation: Overcoming Bitmap Limitations in Freehand Drawings
- Complexity of Freehand: Freehand drawings contain fluctuating lines and noise that make standard vectorization techniques difficult to apply directly.
- Limitation of Bitmap Analysis: Most existing systems at the time attempted to interpret the document by working directly on the static bitmap image throughout the process.
- Need for Context: Interpretation requires a dynamic resource that can evolve as knowledge is extracted (e.g., recognizing a polygon changes the context for its neighbors).
Novelty: Dynamic Structural Graphs and Recursive Specialists
The authors propose a Structural Representation as the unique resource for interpretation.
- Quadrilateral Primitives: The system builds Quadrilaterals (pairs of vectors) to represent thin shapes, which are robust to handwriting fluctuations.
- Structural Graph: These primitives are organized into a graph where arcs represent geometric relationships (T-junctions, L-junctions, parallels).
- Specialist Agents: Interpretation is driven by independent modules (specialists) that browse this graph recursively to identify high-level chemical entities like rings (polygons) or chains.
Experimental Setup and Outcomes
- Validation Set: The system was tested on 20 handwritten documents containing chemical formulas. Critique: This is a very small sample size by modern standards, making the robustness of the 97% claim difficult to verify across diverse handwriting styles.
- Text Database: A separate base of 328 models was used for the text recognition component.
- High Graphical Accuracy: The system achieved a $\approx 97%$ recognition rate for graphical parts (chemical elements like rings and bonds).
- Text Recognition: The text recognition module achieved a $\approx 93%$ success rate.
- Robustness: The structural graph approach successfully handled multiple liaisons, polygons, chains and allowed for the progressive construction of a solution consistent with the context.
Reproducibility Details
Data
| Purpose | Dataset | Size | Notes |
|---|---|---|---|
| Evaluation | Handwritten Documents | 20 docs | Off-line documents at 300 dpi |
| Training | Character Models | 328 models | Used for the Pattern Matching text recognition base |
Algorithms
The interpretation process is divided into two distinct phases:
1. Global Perception (Graph Construction)
- Vectorization: Contour tracking produces a chain of vectors, which are simplified via iterative polygonal approximation until fusion stabilizes (2-5 iterations).
- Quadrilateral Formation: Vectors are paired to form quadrilaterals based on Euclidean distance and “empirical” alignment criteria.
- Graph Generation: Quadrilaterals become nodes. Arcs are created based on “zones of influence” and classified into 5 types: T-junction, Intersection (X), Parallel (//), L-junction, and Successive (S).
- Redraw Heuristic: A pre-processing step transforms T, X, and S junctions into L or // relations, as chemical drawings primarily consist of L-junctions and parallels.
2. Specialists (Interpretation)
- Liaison Specialist: Scans the graph for // arcs or quadrilaterals with free extremities to identify bonds.
- Polygon/Chain Specialist: Uses recursive
look-leftandlook-rightprocedures. If a search returns to the start node after $n$ steps, a polygon is detected. - Text Localization: Clusters “short” quadrilaterals by physical proximity into “focus zones”. Zones are classified as text/non-text based on connected components.
Models
Text Recognition Hybrid:
- Normalization & Pattern Matching: A classic method using the database of 328 models.
- Structural Rule Base: Uses “significant” quadrilaterals (length $\ge 1/3$ of zone dimension) to verify characters. A rule base defines the expected count of horizontal, vertical, and diagonal lines for each character.
Evaluation
| Metric | Value | Baseline | Notes |
|---|---|---|---|
| Graphical Element Recognition | ~97% | N/A | Evaluated on 20 documents (Fig. 7 examples) |
| Text Recognition | ~93% | N/A | Evaluated on 20 documents |
Citation
@inproceedings{ramelAutomaticReadingHandwritten1999,
title = {Automatic Reading of Handwritten Chemical Formulas from a Structural Representation of the Image},
booktitle = {Proceedings of the {{Fifth International Conference}} on {{Document Analysis}} and {{Recognition}}. {{ICDAR}} '99 ({{Cat}}. {{No}}.{{PR00318}})},
author = {Ramel, J.-Y. and Boissier, G. and Emptoz, H.},
year = 1999,
pages = {83--86},
publisher = {IEEE},
address = {Bangalore, India},
doi = {10.1109/ICDAR.1999.791730},
isbn = {978-0-7695-0318-9}
}