Paper Information
Citation: Ramel, J.-Y., Boissier, G., & Emptoz, H. (1999). Automatic Reading of Handwritten Chemical Formulas from a Structural Representation of the Image. Proceedings of the Fifth International Conference on Document Analysis and Recognition (ICDAR ‘99), 83-86. https://doi.org/10.1109/ICDAR.1999.791730
Publication: ICDAR 1999
What kind of paper is this?
Method. This paper proposes a novel system architecture for document analysis. It introduces a specific pipeline (Global Perception followed by Incremental Extraction) and validates this “new strategy” with recognition rates on specific tasks. The core contribution is the shift from bitmap-based processing to a structural graph representation of graphical primitives.
What is the motivation?
- Complexity of Freehand: Freehand drawings contain fluctuating lines and noise that make standard vectorization techniques difficult to apply directly.
- Limitation of Bitmap Analysis: Most existing systems at the time attempted to interpret the document by working directly on the static bitmap image throughout the process.
- Need for Context: Interpretation requires a dynamic resource that can evolve as “knowledge” is extracted (e.g., recognizing a polygon changes the context for its neighbors).
What is the novelty here?
The authors propose a Structural Representation as the “unique resource” for interpretation, rather than the original image.
- Quadrilateral Primitives: Instead of simple vectors, the system builds “Quadrilaterals” (pairs of vectors) to represent thin shapes, which are robust to handwriting fluctuations.
- Structural Graph: These primitives are organized into a graph where arcs represent geometric relationships (T-junctions, L-junctions, parallels).
- Specialist Agents: Interpretation is driven by independent modules (“specialists”) that browse this graph recursively to identify high-level chemical entities like rings (polygons) or chains.
What experiments were performed?
- Validation Set: The system was tested on 20 handwritten documents containing chemical formulas.
- Text Database: A separate base of 328 models was used for the text recognition component.
- Metric: Recognition rates were calculated for both text components and graphical elements (chemical structures).
What outcomes/conclusions?
- High Graphical Accuracy: The system achieved a 97% recognition rate for graphical parts (chemical elements like rings and bonds).
- Text Recognition: The text recognition module achieved a 93% success rate.
- Robustness: The structural graph approach successfully handled “multiple liaisons, polygons, chains” and allowed for the progressive construction of a solution consistent with the context.
Reproducibility Details
Data
| Purpose | Dataset | Size | Notes |
|---|---|---|---|
| Evaluation | Handwritten Documents | 20 docs | Off-line documents at 300 dpi |
| Training | Character Models | 328 models | Used for the Pattern Matching text recognition base |
Algorithms
The interpretation process is divided into two distinct phases:
1. Global Perception (Graph Construction)
- Vectorization: Contour tracking produces a chain of vectors, which are simplified via iterative polygonal approximation until fusion stabilizes (2-5 iterations).
- Quadrilateral Formation: Vectors are paired to form quadrilaterals based on Euclidean distance and “empirical” alignment criteria.
- Graph Generation: Quadrilaterals become nodes. Arcs are created based on “zones of influence” and classified into 5 types: T-junction, Intersection (X), Parallel (//), L-junction, and Successive (S).
- Redraw Heuristic: A pre-processing step transforms T, X, and S junctions into L or // relations, as chemical drawings primarily consist of L-junctions and parallels.
2. Specialists (Interpretation)
- Liaison Specialist: Scans the graph for // arcs or quadrilaterals with free extremities to identify bonds.
- Polygon/Chain Specialist: Uses recursive
look-leftandlook-rightprocedures. If a search returns to the start node after $n$ steps, a polygon is detected. - Text Localization: Clusters “short” quadrilaterals by physical proximity into “focus zones”. Zones are classified as text/non-text based on connected components.
Models
Text Recognition Hybrid:
- Normalization & Pattern Matching: A classic method using the database of 328 models.
- Structural Rule Base: Uses “significant” quadrilaterals (length $\ge 1/3$ of zone dimension) to verify characters. A rule base defines the expected count of horizontal, vertical, and diagonal lines for each character.
Evaluation
| Metric | Value | Baseline | Notes |
|---|---|---|---|
| Graphical Element Recognition | ~97% | N/A | Evaluated on 20 documents (Fig. 7 examples) |
| Text Recognition | ~93% | N/A | Evaluated on 20 documents |
Citation
@inproceedings{ramelAutomaticReadingHandwritten1999,
title = {Automatic Reading of Handwritten Chemical Formulas from a Structural Representation of the Image},
booktitle = {Proceedings of the {{Fifth International Conference}} on {{Document Analysis}} and {{Recognition}}. {{ICDAR}} '99 ({{Cat}}. {{No}}.{{PR00318}})},
author = {Ramel, J.-Y. and Boissier, G. and Emptoz, H.},
year = 1999,
pages = {83--86},
publisher = {IEEE},
address = {Bangalore, India},
doi = {10.1109/ICDAR.1999.791730},
isbn = {978-0-7695-0318-9}
}