Paper Information
Citation: Ouyang, T. Y., & Davis, R. (2011). ChemInk: A Natural Real-Time Recognition System for Chemical Drawings. Proceedings of the 16th International Conference on Intelligent User Interfaces, 267-276. https://doi.org/10.1145/1943403.1943444
Publication: IUI ‘11
What kind of paper is this?
This is a Method paper. It proposes a novel architectural framework for sketch recognition that integrates visual features at three distinct levels (inkpoints, segments, symbols) into a single probabilistic model. The rhetorical structure centers on the proposal of this new architecture, the introduction of a specific “trainable corner detector” algorithm, and the validation of these methods against existing benchmarks and alternative toolsets (ChemDraw).
What is the motivation?
The primary motivation is to bridge the gap between the natural, efficient process of drawing chemical diagrams by hand and the cumbersome “point-click-and-drag” interactions required by CAD tools like ChemDraw. While chemists prefer sketching for communication, existing digital tools do not offer the same speed or ease of use. The goal is to build an intelligent system that understands freehand sketches in real-time, converting them into structured data suitable for analysis or search.
What is the novelty here?
The core novelty lies in the hierarchical joint model. Unlike previous approaches that might treat stroke segmentation and symbol recognition as separate, isolated steps, ChemInk uses a Conditional Random Field (CRF) to jointly model dependencies across three levels:
- Inkpoints: Local visual appearance.
- Segments: Stroke fragments separated by corners.
- Candidates: Potential symbol groupings.
Additionally, the paper introduces a trainable corner detector that learns domain-specific corner definitions from data rather than relying on heuristic thresholds (like curvature or speed extremes).
What experiments were performed?
The authors conducted two primary evaluations:
- Off-line Accuracy Evaluation:
- Dataset: 12 real-world organic compounds drawn by 10 participants.
- Metric: Recognition accuracy (Recall and Precision).
- Baseline: Comparison against their own previous work (O&D 2009) and ablations (with/without context).
- On-line User Study:
- Task: 9 participants (chemistry students) drew 5 diagrams using both ChemInk (Tablet PC) and ChemDraw (Mouse/Keyboard).
- Metric: Time to completion and subjective user ratings (speed/ease of use).
What were the outcomes and conclusions drawn?
- Accuracy: The system achieved 97.4% symbol recognition accuracy, slightly outperforming the best prior result (97.1%). The trainable corner detector achieved 99.91% recall.
- Speed: Users were twice as fast using ChemInk (avg. 36s) compared to ChemDraw (avg. 79s).
- Usability: Participants rated ChemInk significantly higher for speed (6.3 vs 4.5) and ease of use (6.3 vs 4.7) on a 7-point scale.
- Conclusion: Sketch recognition is a viable, superior alternative to standard CAD tools for authoring chemical diagrams.
Reproducibility Details
Data
- Training/Test Data: 12 real-world organic compounds (e.g., Aspirin, Penicillin) drawn by 10 participants (organic chemistry familiar).
- Evaluation Split: User-independent cross-validation (training on 9 users, testing on 1).
- Input: Raw digital ink (strokes) collected on a Tablet PC.
Algorithms
1. Corner Detection (Trainable)
- Method: Iterative vertex elimination.
- Cost Function: $cost(p_{i}) = \sqrt{mse(s_{i}; p_{i-1}, p_{i+1})} \cdot dist(p_{i}; p_{i-1}, p_{i+1})$
- Procedure: Repeatedly remove the vertex with the lowest cost until the classifier (trained on features like cost, diagonal length, ink density) predicts the remaining vertices are corners.
2. Feature Extraction
- Inkpoints: Sampled at regular intervals. Features = $10 \times 10$ pixel orientation filters (0, 45, 90, 135 degrees) at two scales ($L/2$, $L$), smoothed and downsampled to $5 \times 5$. Total 400 features.
- Segments: Similar image features centered at segment midpoint, plus geometric features (length, ink density).
- Candidates: 5 feature images ($20 \times 20$) including an “endpoint” image, stretched to normalize aspect ratio.
- Dimensionality Reduction: PCA used to compress feature images to 256 components.
3. Structure Generation
- Clustering: Agglomerative clustering with a complete-link metric to connect symbols.
- Threshold: Stop clustering at distance $0.4L$.
Models
Conditional Random Field (CRF)
- Structure: 3-level hierarchy (Inkpoints $V_p$, Segments $V_s$, Candidates $V_c$).
- Nodes:
- $V_p, V_s$ labels: “bond”, “hash”, “wedge”, “text”.
- $V_c$ labels: specific candidate interpretations.
- Edges/Potentials:
- Entity-Feature: $\phi(y, x)$ (Linear classifier).
- Consistency: $\psi(y_i, y_j)$ (Hard constraint: child must match parent label).
- Spatial Context: $\psi_{ss}(y_i, y_j)$ (Pairwise geometric relationships between segments: angle, distance).
- Overlap: Prevents conflicting candidates from sharing segments.
- Inference: Loopy Belief Propagation (up to 100 iterations).
- Training: Maximum Likelihood via gradient ascent (L-BFGS).
Evaluation
- Primary Metric: Accuracy (Recall/Precision) of symbol detection.
- Comparison: Compared against Ouyang & Davis 2009 (previous SOTA).
- Speed Metric: Wall-clock time for diagram creation (ChemInk vs. ChemDraw).
Hardware
- Processor: 3.7 GHz processor (single thread) for base benchmarking (approx. 1 sec/sketch).
- Deployment: Validated on 1.8 GHz Tablet PCs using multi-core parallelization for real-time feedback.
Citation
@inproceedings{ouyangChemInkNaturalRealtime2011,
title = {ChemInk: A Natural Real-Time Recognition System for Chemical Drawings},
shorttitle = {ChemInk},
booktitle = {Proceedings of the 16th International Conference on Intelligent User Interfaces},
author = {Ouyang, Tom Y. and Davis, Randall},
year = {2011},
month = feb,
pages = {267--276},
publisher = {ACM},
address = {Palo Alto, CA, USA},
doi = {10.1145/1943403.1943444},
isbn = {978-1-4503-0419-1}
}