<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Hand-Drawn Structure Recognition on Hunter Heidenreich | ML Research Scientist</title><link>https://hunterheidenreich.com/notes/chemistry/optical-structure-recognition/hand-drawn/</link><description>Recent content in Hand-Drawn Structure Recognition on Hunter Heidenreich | ML Research Scientist</description><image><title>Hunter Heidenreich | ML Research Scientist</title><url>https://hunterheidenreich.com/img/avatar.webp</url><link>https://hunterheidenreich.com/img/avatar.webp</link></image><generator>Hugo -- 0.147.7</generator><language>en-US</language><copyright>2026 Hunter Heidenreich</copyright><lastBuildDate>Tue, 07 Apr 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://hunterheidenreich.com/notes/chemistry/optical-structure-recognition/hand-drawn/index.xml" rel="self" type="application/rss+xml"/><item><title>OCSAug: Diffusion-Based Augmentation for Hand-Drawn OCSR</title><link>https://hunterheidenreich.com/notes/chemistry/optical-structure-recognition/hand-drawn/ocsaug/</link><pubDate>Sat, 20 Dec 2025 00:00:00 +0000</pubDate><guid>https://hunterheidenreich.com/notes/chemistry/optical-structure-recognition/hand-drawn/ocsaug/</guid><description>A diffusion-based data augmentation pipeline (OCSAug) using DDPM and RePaint to improve optical chemical structure recognition on hand-drawn images.</description><content:encoded><![CDATA[<h2 id="document-taxonomy-ocsaug-as-a-novel-method">Document Taxonomy: OCSAug as a Novel Method</h2>
<p>This is a <strong>Method</strong> paper according to the <a href="/notes/interdisciplinary/research-methods/ai-physical-sciences-paper-taxonomy/">taxonomy</a>. It proposes a novel data augmentation pipeline (<strong>OCSAug</strong>) that integrates Denoising Diffusion Probabilistic Models (DDPM) and the RePaint algorithm to address the data scarcity problem in hand-drawn optical chemical structure recognition (OCSR). The contribution is validated through systematic benchmarking against existing augmentation techniques (RDKit, Randepict) and ablation studies on mask design.</p>
<h2 id="expanding-hand-drawn-training-data-for-ocsr">Expanding Hand-Drawn Training Data for OCSR</h2>
<p>A vast amount of molecular structure data exists in analog formats, such as hand-drawn diagrams in research notes or older literature. While OCSR models perform well on digitally rendered images, they struggle with hand-drawn images due to noise, varying handwriting styles, and distortions. Current datasets for hand-drawn images (e.g., DECIMER) are too small to train effective models, and existing augmentation tools (RDKit, Randepict) fail to generate sufficiently realistic hand-drawn variations.</p>
<h2 id="ocsaug-pipeline-masked-repaint-via-generative-ai">OCSAug Pipeline: Masked RePaint via Generative AI</h2>
<p>The core novelty is <strong>OCSAug</strong>, a three-phase pipeline that uses generative AI to synthesize training data:</p>
<ol>
<li><strong>DDPM + RePaint</strong>: It utilizes a DDPM to learn the distribution of hand-drawn images and the RePaint algorithm for inpainting.</li>
<li><strong>Structural Masking</strong>: It introduces <strong>vertical and horizontal stripe pattern masks</strong>. These masks selectively obscure parts of atoms or bonds, forcing the diffusion model to reconstruct them with irregular &ldquo;hand-drawn&rdquo; styles while preserving the underlying chemical topology.</li>
<li><strong>Label Transfer</strong>: Because the chemical structure is preserved during inpainting, the SMILES label from the original image is directly transferred to the augmented image, bypassing the need for re-annotation.</li>
</ol>
<h2 id="benchmarking-diffusion-augmentations-on-decimer">Benchmarking Diffusion Augmentations on DECIMER</h2>
<p>The authors evaluated OCSAug using the <strong>DECIMER dataset</strong>, specifically a &ldquo;drug-likeness&rdquo; subset filtered by Lipinski&rsquo;s and Veber&rsquo;s rules.</p>
<ul>
<li><strong>Baselines</strong>: The method was compared against <strong>RDKit</strong> (digital generation) and <strong>Randepict</strong> (rule-based augmentation).</li>
<li><strong>Models</strong>: Four recent OCSR models were fine-tuned: <strong>MolScribe</strong>, <strong>DECIMER 1.0 (I2S)</strong>, <strong>MolNexTR</strong>, and <strong>MPOCSR</strong>.</li>
<li><strong>Metrics</strong>:
<ul>
<li><strong>Tanimoto Similarity</strong>: To measure prediction accuracy against ground truth.</li>
<li><strong>Fréchet Inception Distance (FID)</strong>: To measure the distributional similarity between generated and real hand-drawn images.</li>
<li><strong>RMSE</strong>: To quantify pixel-level structural preservation across different mask thicknesses.</li>
</ul>
</li>
</ul>
<h2 id="improved-generalization-capabilities-and-fid-scores">Improved Generalization Capabilities and FID Scores</h2>
<ul>
<li><strong>Performance Boost</strong>: OCSAug improved recognition accuracy (Tanimoto similarity) by <strong>1.918 to 3.820 times</strong> compared to non-fine-tuned baselines (Improvement Ratio), outperforming traditional augmentation techniques such as RDKit and Randepict (1.570-3.523x).</li>
<li><strong>Data Quality</strong>: OCSAug achieved the lowest FID score (0.471) compared to Randepict (4.054) and RDKit (10.581), indicating its generated images are much closer to the real hand-drawn distribution.</li>
<li><strong>Generalization</strong>: The method showed improved generalization on a newly collected real-world dataset of 463 images from 6 volunteers.</li>
<li><strong>Resolution Mixing</strong>: Training MolScribe and MolNexTR with a mix of $128 \times 128$, $256 \times 256$, and $512 \times 512$ resolution images improved Tanimoto similarity (e.g., MolScribe from 0.585 to 0.640), though this strategy did not help I2S or MPOCSR.</li>
<li><strong>Real-World Evaluation</strong>: On a newly collected dataset of 463 hand-drawn images from 6 volunteers (88 drug compounds), the MPOCSR model fine-tuned with OCSAug achieved 0.367 exact-match accuracy (Tanimoto = 1.0), compared to 0.365 for non-augmented fine-tuning and 0.037 for no fine-tuning. The area under the accuracy curve showed a more notable improvement in reducing misrecognition.</li>
<li><strong>Limitations</strong>: The generation process is slow (3 weeks for 10k images on a single GPU). The fixed stripe masks may struggle with highly complex, non-drug-like geometries: when evaluated on the full DECIMER dataset (without drug-likeness filtering), OCSAug did not yield uniform improvements across all models.</li>
</ul>
<hr>
<h2 id="reproducibility">Reproducibility</h2>
<h3 id="artifacts">Artifacts</h3>
<table>
  <thead>
      <tr>
          <th>Artifact</th>
          <th>Type</th>
          <th>License</th>
          <th>Notes</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><a href="https://github.com/jjjabcd/OCSAug">OCSAug</a></td>
          <td>Code</td>
          <td>MIT</td>
          <td>Official implementation using guided-diffusion and RePaint</td>
      </tr>
      <tr>
          <td><a href="https://zenodo.org/records/6456306">DECIMER Hand-Drawn Dataset</a></td>
          <td>Dataset</td>
          <td>CC-BY 4.0</td>
          <td>5,088 hand-drawn molecular structure images from 24 individuals</td>
      </tr>
  </tbody>
</table>
<h3 id="data">Data</h3>
<ul>
<li><strong>Source</strong>: DECIMER dataset (hand-drawn images).</li>
<li><strong>Filtering</strong>: A &ldquo;drug-likeness&rdquo; filter was applied (Lipinski&rsquo;s rule of 5 + Veber&rsquo;s rules) along with an atom filter (C, H, O, S, F, Cl, Br, N, P only).</li>
<li><strong>Final Size</strong>: 3,194 samples, split into:
<ul>
<li><strong>Training</strong>: 2,604 samples.</li>
<li><strong>Validation</strong>: 290 samples.</li>
<li><strong>Test</strong>: 300 samples.</li>
</ul>
</li>
<li><strong>Resolution</strong>: All images resized to $256 \times 256$ pixels.</li>
</ul>
<h3 id="algorithms">Algorithms</h3>
<ul>
<li><strong>Framework</strong>: DDPM implemented using <code>guided-diffusion</code>.</li>
<li><strong>RePaint Settings</strong>:
<ul>
<li>Total time steps: 250.</li>
<li>Jump length: 10.</li>
<li>Resampling counts: 10.</li>
</ul>
</li>
<li><strong>Masking Strategy</strong>:
<ul>
<li><strong>Vertical Stripes</strong>: Obscure atom symbols to vary handwriting style.</li>
<li><strong>Horizontal Stripes</strong>: Obscure bonds to vary length/thickness/alignment.</li>
<li><strong>Optimal Thickness</strong>: A stripe thickness of <strong>4 pixels</strong> was found to be optimal for balancing diversity and structural preservation.</li>
</ul>
</li>
</ul>
<h3 id="models">Models</h3>
<p>The OCSR models were pretrained on PubChem (digital images) and then fine-tuned on the OCSAug dataset.</p>
<ul>
<li><strong>MolScribe</strong>: Swin Transformer encoder, Transformer decoder. Fine-tuned (all layers) for 30 epochs, batch size 16-128, LR 2e-5.</li>
<li><strong>I2S (DECIMER 1.0)</strong>: Inception V3 encoder (frozen), FC/Decoder fine-tuned. 25 epochs, batch size 64, LR 1e-5.</li>
<li><strong>MolNexTR</strong>: Dual-stream encoder (Swin + CNN). Fine-tuned (all layers) for 30 epochs, batch size 16-64, LR 2e-5.</li>
<li><strong>MPOCSR</strong>: MPViT backbone. Fine-tuned (all layers) for 25 epochs, batch size 16-32, LR 4e-5.</li>
</ul>
<h3 id="evaluation">Evaluation</h3>
<ul>
<li>
<p><strong>Metric</strong>: Improvement Ratio (IR) of Tanimoto Similarity (TS), calculated iteratively or defined as:</p>
<p>$$
\text{IR} = \frac{\text{TS}_{\text{finetuned}}}{\text{TS}_{\text{non-finetuned}}}
$$</p>
</li>
<li>
<p><strong>Validation</strong>: Cross-validation on the split DECIMER dataset.</p>
</li>
</ul>
<h3 id="hardware">Hardware</h3>
<ul>
<li><strong>GPU</strong>: NVIDIA GeForce RTX 4090.</li>
<li><strong>Training Time</strong>: DDPM training took ~6 days.</li>
<li><strong>Generation Time</strong>: Generating 2,600 augmented images took ~70 hours.</li>
</ul>
<hr>
<h2 id="paper-information">Paper Information</h2>
<p><strong>Citation</strong>: Kim, J. H., &amp; Choi, J. (2025). OCSAug: diffusion-based optical chemical structure data augmentation for improved hand-drawn chemical structure image recognition. <em>The Journal of Supercomputing</em>, 81, 926.</p>
<p><strong>Publication</strong>: The Journal of Supercomputing 2025</p>
<p><strong>Additional Resources</strong>:</p>
<ul>
<li><a href="https://github.com/jjjabcd/OCSAug">Official Repository</a></li>
<li><a href="https://zenodo.org/records/6456306">DECIMER Dataset</a></li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bibtex" data-lang="bibtex"><span style="display:flex;"><span><span style="color:#a6e22e">@article</span>{kimOCSAugDiffusionbasedOptical2025,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">title</span> = <span style="color:#e6db74">{OCSAug: Diffusion-Based Optical Chemical Structure Data Augmentation for Improved Hand-Drawn Chemical Structure Image Recognition}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">shorttitle</span> = <span style="color:#e6db74">{OCSAug}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">author</span> = <span style="color:#e6db74">{Kim, Jin Hyuk and Choi, Jonghwan}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">year</span> = <span style="color:#ae81ff">2025</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">month</span> = may,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">journal</span> = <span style="color:#e6db74">{The Journal of Supercomputing}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">volume</span> = <span style="color:#e6db74">{81}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">number</span> = <span style="color:#e6db74">{8}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">pages</span> = <span style="color:#e6db74">{926}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">doi</span> = <span style="color:#e6db74">{10.1007/s11227-025-07406-4}</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div>]]></content:encoded></item><item><title>Enhanced DECIMER for Hand-Drawn Structure Recognition</title><link>https://hunterheidenreich.com/notes/chemistry/optical-structure-recognition/hand-drawn/decimer-hand-drawn/</link><pubDate>Fri, 19 Dec 2025 00:00:00 +0000</pubDate><guid>https://hunterheidenreich.com/notes/chemistry/optical-structure-recognition/hand-drawn/decimer-hand-drawn/</guid><description>An improved encoder-decoder model (EfficientNetV2 + Transformer) converts hand-drawn chemical structures into SMILES strings using synthetic training data.</description><content:encoded><![CDATA[<h2 id="method-contribution-architectural-optimization">Method Contribution: Architectural Optimization</h2>
<p>This is a <strong>Method</strong> paper. It proposes an enhanced neural network architecture (EfficientNetV2 + Transformer) specifically designed to solve the problem of recognizing hand-drawn chemical structures. The primary contribution is architectural optimization and a data-driven training strategy, validated through ablation studies (comparing encoders) and benchmarked against existing rule-based and deep learning tools.</p>
<h2 id="motivation-digitizing-dark-chemical-data">Motivation: Digitizing &ldquo;Dark&rdquo; Chemical Data</h2>
<p>Chemical information in legacy laboratory notebooks and modern tablet-based inputs often exists as hand-drawn sketches.</p>
<ul>
<li><strong>Gap:</strong> Existing Optical Chemical Structure Recognition (OCSR) tools (particularly rule-based ones) lack robustness and fail when images have variability in style, line thickness, or noise.</li>
<li><strong>Need:</strong> There is a critical need for automated tools to digitize this &ldquo;dark data&rdquo; effectively to preserve it and make it machine-readable and searchable.</li>
</ul>
<h2 id="core-innovation-decoder-only-design-and-synthetic-scaling">Core Innovation: Decoder-Only Design and Synthetic Scaling</h2>
<p>The core novelty is the <strong>architectural enhancement</strong> and <strong>synthetic training strategy</strong>:</p>
<ol>
<li><strong>Decoder-Only Transformer:</strong> Using only the decoder part of the Transformer (instead of a full encoder-decoder Transformer) improved average accuracy across OCSR benchmarks from 61.28% to 69.27% (Table 3 in the paper).</li>
<li><strong>EfficientNetV2 Integration:</strong> Replacing standard CNNs or EfficientNetV1 with <strong>EfficientNetV2-M</strong> provided better feature extraction and 2x faster training speeds.</li>
<li><strong>Scale of Synthetic Data:</strong> The authors demonstrate that scaling synthetic training data (up to 152 million images generated by RanDepict) directly correlates with improved generalization to real-world hand-drawn images, without ever training on real hand-drawn data.</li>
</ol>
<h2 id="experimental-setup-ablation-and-real-world-baselines">Experimental Setup: Ablation and Real-World Baselines</h2>
<ul>
<li><strong>Model Selection (Ablation):</strong> Tested three architectures (EfficientNetV2-M + Full Transformer, EfficientNetV1-B7 + Decoder-only, EfficientNetV2-M + Decoder-only) on standard benchmarks (JPO, CLEF, USPTO, UOB).</li>
<li><strong>Data Scaling:</strong> Trained the best model on four progressively larger datasets (from 4M to 152M images) to measure performance gains.</li>
<li><strong>Real-World Benchmarking:</strong> Validated the final model on the <strong>DECIMER Hand-drawn dataset</strong> (5088 real images drawn by volunteers) and compared against 9 other tools (OSRA, MolVec, Img2Mol, MolScribe, etc.).</li>
</ul>
<h2 id="results-and-conclusions-strong-accuracy-on-hand-drawn-scans">Results and Conclusions: Strong Accuracy on Hand-Drawn Scans</h2>
<ul>
<li><strong>Strong Performance:</strong> The final DECIMER model achieved <strong>99.72% valid predictions</strong> and <strong>73.25% exact accuracy</strong> on the hand-drawn benchmark. The next best non-DECIMER tool was MolGrapher at 10.81% accuracy, followed by MolScribe at 7.65%.</li>
<li><strong>Robustness:</strong> Deep learning methods outperform rule-based methods (which scored 3% or less accuracy) on hand-drawn data.</li>
<li><strong>Data Saturation:</strong> Quadrupling the dataset from 38M to 152M images yielded only marginal gains (about 3 percentage points in accuracy), suggesting current synthetic data strategies may be hitting a plateau.</li>
</ul>
<hr>
<h2 id="reproducibility">Reproducibility</h2>
<h3 id="artifacts">Artifacts</h3>
<table>
  <thead>
      <tr>
          <th>Artifact</th>
          <th>Type</th>
          <th>License</th>
          <th>Notes</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><a href="https://github.com/Kohulan/DECIMER-Image_Transformer">DECIMER Image Transformer (GitHub)</a></td>
          <td>Code</td>
          <td>MIT</td>
          <td>Official TensorFlow implementation</td>
      </tr>
      <tr>
          <td><a href="https://doi.org/10.5281/zenodo.10781330">Model Weights (Zenodo)</a></td>
          <td>Model</td>
          <td>Unknown</td>
          <td>Pre-trained hand-drawn model weights</td>
      </tr>
      <tr>
          <td><a href="https://pypi.org/project/decimer/">DECIMER PyPi Package</a></td>
          <td>Code</td>
          <td>MIT</td>
          <td>Installable Python package</td>
      </tr>
      <tr>
          <td><a href="https://github.com/OBrink/RanDepict">RanDepict (GitHub)</a></td>
          <td>Code</td>
          <td>MIT</td>
          <td>Synthetic hand-drawn image generation toolkit</td>
      </tr>
  </tbody>
</table>
<h3 id="data">Data</h3>
<p>The model was trained entirely on <strong>synthetic data</strong> generated using the <a href="https://github.com/OBrink/RanDepict">RanDepict</a> toolkit. No real hand-drawn images were used for training.</p>
<table>
  <thead>
      <tr>
          <th>Dataset</th>
          <th>Source</th>
          <th>Molecules</th>
          <th>Total Images</th>
          <th>Notes</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>1</td>
          <td>ChEMBL</td>
          <td>2,187,669</td>
          <td>4,375,338</td>
          <td>1 augmented + 1 clean per molecule</td>
      </tr>
      <tr>
          <td>2</td>
          <td>ChEMBL</td>
          <td>2,187,669</td>
          <td>13,126,014</td>
          <td>2 augmented + 4 clean per molecule</td>
      </tr>
      <tr>
          <td>3</td>
          <td>PubChem</td>
          <td>9,510,000</td>
          <td>38,040,000</td>
          <td>1 augmented + 3 clean per molecule</td>
      </tr>
      <tr>
          <td>4</td>
          <td>PubChem</td>
          <td>38,040,000</td>
          <td><strong>152,160,000</strong></td>
          <td>1 augmented + 3 clean per molecule</td>
      </tr>
  </tbody>
</table>
<p>A separate <strong>model selection</strong> experiment used a 1,024,000-molecule subset of ChEMBL to compare the three architectures (Table 1 in the paper). The <strong>DECIMER Hand-Drawn</strong> evaluation dataset consists of 5,088 real hand-drawn images from 23 volunteers.</p>
<p><strong>Preprocessing:</strong></p>
<ul>
<li><a href="/notes/chemistry/molecular-representations/notations/smiles/">SMILES</a> strings length &lt; 300 characters.</li>
<li>Images resized to $512 \times 512$.</li>
<li>Images generated with and without &ldquo;hand-drawn style&rdquo; augmentations.</li>
</ul>
<h3 id="algorithms">Algorithms</h3>
<ul>
<li><strong>Tokenization:</strong> SMILES split by heavy atoms, brackets, bond symbols, and special characters. Start <code>&lt;start&gt;</code> and end <code>&lt;end&gt;</code> tokens added; padded with <code>&lt;pad&gt;</code>.</li>
<li><strong>Optimization:</strong> Adam optimizer with a custom learning rate schedule (as specified in the original Transformer paper). A dropout rate of 0.1 was used.</li>
<li><strong>Loss Function:</strong> Trained using focal loss to address class imbalance for rare tokens. The focal loss formulation reduces the relative loss for well-classified examples:
$$
\text{FL}(p_{\text{t}}) = -\alpha_{\text{t}} (1 - p_{\text{t}})^\gamma \log(p_{\text{t}})
$$</li>
<li><strong>Augmentations:</strong> RanDepict applied synthetic distortions to mimic handwriting (wobbly lines, variable thickness, etc.).</li>
</ul>
<h3 id="models">Models</h3>
<p>The final architecture (Model 3) is an Encoder-Decoder structure:</p>
<ul>
<li><strong>Encoder:</strong> <strong>EfficientNetV2-M</strong> (pretrained ImageNet backbone).
<ul>
<li>Input: $512 \times 512 \times 3$ image.</li>
<li>Output Features: $16 \times 16 \times 512$ (reshaped to sequence length 256, dimension 512).</li>
<li><em>Note:</em> The final fully connected layer of the CNN is removed.</li>
</ul>
</li>
<li><strong>Decoder:</strong> <strong>Transformer (Decoder-only)</strong>.
<ul>
<li>Layers: 6</li>
<li>Attention Heads: 8</li>
<li>Embedding Dimension: 512</li>
</ul>
</li>
<li><strong>Output:</strong> Predicted SMILES string token by token.</li>
</ul>
<h3 id="evaluation">Evaluation</h3>
<p>Metrics used for evaluation:</p>
<ol>
<li><strong>Valid Predictions (%):</strong> Percentage of outputs that are syntactically valid SMILES.</li>
<li><strong>Exact Match Accuracy (%):</strong> Canonical SMILES string identity.</li>
<li><strong>Tanimoto Similarity:</strong> Fingerprint similarity (PubChem fingerprints) between ground truth and prediction.</li>
</ol>
<p><strong>Data Scaling Results (Hand-Drawn Dataset, Table 4 in the paper):</strong></p>
<table>
  <thead>
      <tr>
          <th>Dataset</th>
          <th>Training Images</th>
          <th>Valid Predictions</th>
          <th>Exact Accuracy</th>
          <th>Tanimoto</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>1 (ChEMBL)</td>
          <td>4,375,338</td>
          <td>96.21%</td>
          <td>5.09%</td>
          <td>0.490</td>
      </tr>
      <tr>
          <td>2 (ChEMBL)</td>
          <td>13,126,014</td>
          <td>97.41%</td>
          <td>26.08%</td>
          <td>0.690</td>
      </tr>
      <tr>
          <td>3 (PubChem)</td>
          <td>38,040,000</td>
          <td>99.67%</td>
          <td>70.34%</td>
          <td>0.939</td>
      </tr>
      <tr>
          <td>4 (PubChem)</td>
          <td>152,160,000</td>
          <td>99.72%</td>
          <td>73.25%</td>
          <td>0.942</td>
      </tr>
  </tbody>
</table>
<p><strong>Comparison with Other Tools (Hand-Drawn Dataset, Table 5 in the paper):</strong></p>
<table>
  <thead>
      <tr>
          <th>OCSR Tool</th>
          <th>Method</th>
          <th>Valid Predictions</th>
          <th>Exact Accuracy</th>
          <th>Tanimoto</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>DECIMER (Ours)</strong></td>
          <td>Deep Learning</td>
          <td><strong>99.72%</strong></td>
          <td><strong>73.25%</strong></td>
          <td><strong>0.94</strong></td>
      </tr>
      <tr>
          <td>DECIMER.ai</td>
          <td>Deep Learning</td>
          <td>96.07%</td>
          <td>26.98%</td>
          <td>0.69</td>
      </tr>
      <tr>
          <td>MolGrapher</td>
          <td>Deep Learning</td>
          <td>99.94%</td>
          <td>10.81%</td>
          <td>0.51</td>
      </tr>
      <tr>
          <td>MolScribe</td>
          <td>Deep Learning</td>
          <td>95.66%</td>
          <td>7.65%</td>
          <td>0.59</td>
      </tr>
      <tr>
          <td>Img2Mol</td>
          <td>Deep Learning</td>
          <td>98.96%</td>
          <td>5.25%</td>
          <td>0.52</td>
      </tr>
      <tr>
          <td>SwinOCSR</td>
          <td>Deep Learning</td>
          <td>97.37%</td>
          <td>5.11%</td>
          <td>0.64</td>
      </tr>
      <tr>
          <td>ChemGrapher</td>
          <td>Deep Learning</td>
          <td>69.56%</td>
          <td>N/A</td>
          <td>0.09</td>
      </tr>
      <tr>
          <td>Imago</td>
          <td>Rule-based</td>
          <td>43.14%</td>
          <td>2.99%</td>
          <td>0.22</td>
      </tr>
      <tr>
          <td>MolVec</td>
          <td>Rule-based</td>
          <td>71.86%</td>
          <td>1.30%</td>
          <td>0.23</td>
      </tr>
      <tr>
          <td>OSRA</td>
          <td>Rule-based</td>
          <td>54.66%</td>
          <td>0.57%</td>
          <td>0.17</td>
      </tr>
  </tbody>
</table>
<h3 id="hardware">Hardware</h3>
<ul>
<li><strong>Compute:</strong> Google Cloud TPU v4-128 pod slice.</li>
<li><strong>Training Time:</strong>
<ul>
<li>EfficientNetV2-M model trained ~2x faster than EfficientNetV1-B7.</li>
<li>Average training time per epoch: 34 minutes (for Model 3 on 1M dataset subset).</li>
</ul>
</li>
<li><strong>Epochs:</strong> Models trained for 25 epochs.</li>
</ul>
<hr>
<h2 id="paper-information">Paper Information</h2>
<p><strong>Citation</strong>: Rajan, K., Brinkhaus, H.O., Zielesny, A. et al. (2024). Advancements in hand-drawn chemical structure recognition through an enhanced DECIMER architecture. <em>Journal of Cheminformatics</em>, 16(78). <a href="https://doi.org/10.1186/s13321-024-00872-7">https://doi.org/10.1186/s13321-024-00872-7</a></p>
<p><strong>Publication</strong>: Journal of Cheminformatics 2024</p>
<p><strong>Additional Resources</strong>:</p>
<ul>
<li><a href="https://pypi.org/project/decimer/">PyPi Package</a></li>
<li><a href="https://doi.org/10.5281/zenodo.10781330">Model Weights (Zenodo)</a></li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bibtex" data-lang="bibtex"><span style="display:flex;"><span><span style="color:#a6e22e">@article</span>{rajanAdvancementsHanddrawnChemical2024,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">title</span> = <span style="color:#e6db74">{Advancements in Hand-Drawn Chemical Structure Recognition through an Enhanced {{DECIMER}} Architecture}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">author</span> = <span style="color:#e6db74">{Rajan, Kohulan and Brinkhaus, Henning Otto and Zielesny, Achim and Steinbeck, Christoph}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">year</span> = <span style="color:#ae81ff">2024</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">month</span> = jul,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">journal</span> = <span style="color:#e6db74">{Journal of Cheminformatics}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">volume</span> = <span style="color:#e6db74">{16}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">number</span> = <span style="color:#e6db74">{1}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">pages</span> = <span style="color:#e6db74">{78}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">issn</span> = <span style="color:#e6db74">{1758-2946}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">doi</span> = <span style="color:#e6db74">{10.1186/s13321-024-00872-7}</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div>]]></content:encoded></item><item><title>ChemReco: Hand-Drawn Chemical Structure Recognition</title><link>https://hunterheidenreich.com/notes/chemistry/optical-structure-recognition/hand-drawn/chemreco/</link><pubDate>Fri, 19 Dec 2025 00:00:00 +0000</pubDate><guid>https://hunterheidenreich.com/notes/chemistry/optical-structure-recognition/hand-drawn/chemreco/</guid><description>A deep learning method using EfficientNet and Transformer to convert hand-drawn chemical structures into SMILES codes, achieving 96.9% accuracy.</description><content:encoded><![CDATA[<h2 id="research-contribution--classification">Research Contribution &amp; Classification</h2>
<p>This is a <strong>Methodological Paper ($\Psi_{\text{Method}}$)</strong> with a significant <strong>Resource ($\Psi_{\text{Resource}}$)</strong> component.</p>
<ul>
<li><strong>Method</strong>: The primary contribution is &ldquo;ChemReco,&rdquo; a specific deep learning pipeline (EfficientNet + Transformer) designed to solve the Optical Chemical Structure Recognition (OCSR) task for hand-drawn images. The authors conduct extensive ablation studies on architecture and data mixing ratios to validate performance.</li>
<li><strong>Resource</strong>: The authors explicitly state that &ldquo;the primary focus of this paper is constructing datasets&rdquo; due to the scarcity of hand-drawn molecular data. They introduce a comprehensive synthetic data generation pipeline involving RDKit modifications and image degradation to create training data.</li>
</ul>
<h2 id="motivation-digitizing-hand-drawn-chemical-sketches">Motivation: Digitizing Hand-Drawn Chemical Sketches</h2>
<p>Hand-drawing is the most intuitive method for chemists and students to record molecular structures. However, digitizing these drawings into machine-readable formats (like <a href="/notes/chemistry/molecular-representations/notations/smiles/">SMILES</a>) usually requires time-consuming manual entry or specialized software.</p>
<ul>
<li><strong>Gap</strong>: Existing OCSR tools and rule-based methods often fail on hand-drawn sketches due to diverse writing styles, poor image quality, and the absence of labeled data.</li>
<li><strong>Application</strong>: Automated recognition enables efficient chemical research and allows for automatic grading in educational settings.</li>
</ul>
<h2 id="core-innovation-synthetic-pipeline-and-hybrid-architecture">Core Innovation: Synthetic Pipeline and Hybrid Architecture</h2>
<p>The paper introduces <strong>ChemReco</strong>, an end-to-end system for recognizing C-H-O structures. Key novelties include:</p>
<ol>
<li><strong>Synthetic Data Pipeline</strong>: A multi-stage generation method that modifies RDKit source code to randomize bond/angle parameters, followed by OpenCV-based augmentation, degradation, and background addition to simulate realistic hand-drawn artifacts.</li>
<li><strong>Architectural Choice</strong>: The specific application of <strong>EfficientNet</strong> (encoder) combined with a <strong>Transformer</strong> (decoder) for this domain, which the authors demonstrate outperforms the more common ResNet+LSTM baselines.</li>
<li><strong>Hybrid Training Strategy</strong>: Finding that a mix of 90% synthetic and 10% real data yields optimal performance, superior to using either dataset alone.</li>
</ol>
<h2 id="methodology--ablation-studies">Methodology &amp; Ablation Studies</h2>
<p>The authors performed a series of ablation studies and comparisons:</p>
<ul>
<li><strong>Synthesis Ablation</strong>: Evaluated the impact of each step in the generation pipeline (RDKit only $\rightarrow$ Augmentation $\rightarrow$ Degradation $\rightarrow$ Background) on validation loss and accuracy.</li>
<li><strong>Dataset Size Ablation</strong>: Tested model performance when trained on synthetic datasets ranging from 100,000 to 1,000,000 images.</li>
<li><strong>Real/Synthetic Ratio</strong>: Investigated the optimal mixing ratio of synthetic to real hand-drawn images (100:0, 90:10, 50:50, 10:90, 0:100), finding that the 90:10 ratio achieved 93.81% exact match, compared to 63.33% for synthetic-only and 65.83% for real-only.</li>
<li><strong>Architecture Comparison</strong>: Benchmarked four encoder-decoder combinations: ResNet vs. EfficientNet encoders paired with LSTM vs. Transformer decoders.</li>
<li><strong>Baseline Comparison</strong>: Compared results against a related study utilizing a CNN+LSTM framework.</li>
</ul>
<h2 id="results--interpretations">Results &amp; Interpretations</h2>
<ul>
<li><strong>Best Performance</strong>: The EfficientNet + Transformer model trained on a 90:10 synthetic-to-real ratio achieved a <strong>96.90% Exact Match</strong> rate on the test set.</li>
<li><strong>Background Robustness</strong>: When training on synthetic data alone (no real images), the best accuracy on background-free test images was approximately 46% (using RDKit-aug-deg), while background test images reached approximately 53% (using RDKit-aug-bkg-deg). Adding random backgrounds during training helped prevent the model from overfitting to clean white backgrounds.</li>
<li><strong>Data Volume</strong>: Increasing the synthetic dataset size from 100k to 1M consistently improved accuracy (average exact match: 49.40% at 100k, 54.29% at 200k, 61.31% at 500k, 63.33% at 1M, all without real images in training).</li>
<li><strong>Encoder-Decoder Comparison</strong> (at 90:10 mix with 1M images):</li>
</ul>
<table>
  <thead>
      <tr>
          <th style="text-align: left">Encoder</th>
          <th style="text-align: left">Decoder</th>
          <th style="text-align: left">Avg. Exact Match (%)</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td style="text-align: left">ResNet</td>
          <td style="text-align: left">LSTM</td>
          <td style="text-align: left">93.81</td>
      </tr>
      <tr>
          <td style="text-align: left">ResNet</td>
          <td style="text-align: left">Transformer</td>
          <td style="text-align: left">94.76</td>
      </tr>
      <tr>
          <td style="text-align: left">EfficientNet</td>
          <td style="text-align: left">LSTM</td>
          <td style="text-align: left">96.31</td>
      </tr>
      <tr>
          <td style="text-align: left">EfficientNet</td>
          <td style="text-align: left">Transformer</td>
          <td style="text-align: left"><strong>96.90</strong></td>
      </tr>
  </tbody>
</table>
<ul>
<li><strong>Superiority over Baselines</strong>: The model outperformed the cited CNN+LSTM baseline from ChemPix (93% vs 76% on the ChemPix test set).</li>
</ul>
<h2 id="limitations">Limitations</h2>
<ul>
<li><strong>Restricted atom types</strong>: The system only handles molecules composed of carbon, hydrogen, and oxygen (C-H-O), excluding nitrogen, sulfur, halogens, and other heteroatoms commonly found in organic chemistry.</li>
<li><strong>Structural complexity</strong>: Only structures with at most one ring are supported. Complex multi-ring systems and fused ring structures are not covered.</li>
<li><strong>Dataset availability</strong>: The real hand-drawn dataset (2,598 images) is not publicly released and is only available upon request from the corresponding author.</li>
<li><strong>Future directions</strong>: The authors suggest expanding to more heteroatoms, complex ring structures, and applications in automated grading of chemistry exams.</li>
</ul>
<hr>
<h2 id="reproducibility">Reproducibility</h2>
<table>
  <thead>
      <tr>
          <th style="text-align: left">Artifact</th>
          <th style="text-align: left">Type</th>
          <th style="text-align: left">License</th>
          <th style="text-align: left">Notes</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td style="text-align: left"><a href="https://github.com/a-die/hdr-DeepLearning">hdr-DeepLearning</a></td>
          <td style="text-align: left">Code</td>
          <td style="text-align: left">Unknown</td>
          <td style="text-align: left">Official implementation in PyTorch</td>
      </tr>
      <tr>
          <td style="text-align: left">Paper</td>
          <td style="text-align: left">Publication</td>
          <td style="text-align: left">CC-BY-4.0</td>
          <td style="text-align: left">Open access via Nature</td>
      </tr>
  </tbody>
</table>
<p>The real hand-drawn dataset (2,598 images) is available upon request from the corresponding author, not publicly downloadable. The synthetic data generation pipeline is described in detail but relies on modified RDKit source code, which is included in the repository.</p>
<h3 id="data">Data</h3>
<p>The study utilizes a combination of collected SMILES data, real hand-drawn images, and generated synthetic images.</p>
<ul>
<li><strong>Source Data</strong>: SMILES codes collected from PubChem, ZINC, <a href="/notes/chemistry/datasets/gdb-11/">GDB-11</a>, and <a href="/notes/chemistry/datasets/gdb-13/">GDB-13</a>. Filtered for C, H, O atoms and max 1 ring.</li>
<li><strong>Real Dataset</strong>: 670 selected SMILES codes drawn by multiple volunteers, totaling <strong>2,598 images</strong>.</li>
<li><strong>Synthetic Dataset</strong>: Generated up to <strong>1,000,000 images</strong> using the pipeline below.</li>
<li><strong>Training Mix</strong>: The optimal training set used 1 million images with a <strong>90:10 ratio</strong> of synthetic to real images.</li>
</ul>
<table>
  <thead>
      <tr>
          <th style="text-align: left">Dataset Type</th>
          <th style="text-align: left">Source</th>
          <th style="text-align: left">Size</th>
          <th style="text-align: left">Notes</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td style="text-align: left"><strong>Real</strong></td>
          <td style="text-align: left">Volunteer Drawings</td>
          <td style="text-align: left">2,598 images</td>
          <td style="text-align: left">Used for mixed training and testing</td>
      </tr>
      <tr>
          <td style="text-align: left"><strong>Synthetic</strong></td>
          <td style="text-align: left">Generated</td>
          <td style="text-align: left">100k - 1M</td>
          <td style="text-align: left">Generated via modified RDKit + OpenCV augmentation/degradation; optionally enhanced with Stable Diffusion</td>
      </tr>
  </tbody>
</table>
<h3 id="algorithms">Algorithms</h3>
<p>The <strong>Synthetic Image Generation Pipeline</strong> is critical for reproduction:</p>
<ol>
<li><strong>RDKit Modification</strong>: Modify source code to introduce random keys, character width, length, and bond angles.</li>
<li><strong>Augmentation (OpenCV)</strong>: Apply sequence: Resize ($p=0.5$), Blur ($p=0.4$), Erode/Dilate ($p=0.2$), Distort ($p=0.8$), Flip ($p=0.5$), Affine ($p=0.7$).</li>
<li><strong>Degradation</strong>: Apply sequence: Salt+pepper noise ($p=0.1$), Contrast ($p=0.7$), Sharpness ($p=0.5$), Invert ($p=0.3$).</li>
<li><strong>Background Addition</strong>: Random backgrounds are augmented (Crop, Distort, Flip) and added to the molecular image to prevent background overfitting.</li>
<li><strong>Diffusion Enhancement</strong>: Stable Diffusion (v1-4) is used for image-to-image enhancement to better simulate hand-drawn styles (prompt: &ldquo;A pencil sketch of [Formula]&hellip; without charge distribution&rdquo;).</li>
</ol>
<h3 id="models">Models</h3>
<p>The system uses an encoder-decoder architecture:</p>
<ul>
<li><strong>Encoder</strong>: <strong>EfficientNet</strong> (pre-trained on ImageNet). The last layer is removed, and features are extracted into a Numpy array.</li>
<li><strong>Decoder</strong>: <strong>Transformer</strong>. Utilizes self-attention to generate the SMILES sequence. Chosen over LSTM for better handling of long-range dependencies.</li>
<li><strong>Output</strong>: Canonical SMILES string.</li>
</ul>
<h3 id="evaluation">Evaluation</h3>
<ul>
<li><strong>Primary Metric</strong>: <strong>Exact Match (EM)</strong>. A strict binary evaluation checking whether the complete generated SMILES perfectly replicates the target string.</li>
<li><strong>Other Metrics</strong>: <strong>Levenshtein Distance</strong> measures edit-level character proximity, while the <strong>Tanimoto coefficient</strong> evaluates structural similarity based on chemical fingerprints. Both were monitored during validation ablation runs.</li>
</ul>
<table>
  <thead>
      <tr>
          <th style="text-align: left">Metric</th>
          <th style="text-align: left">Value</th>
          <th style="text-align: left">Baseline (CNN+LSTM)</th>
          <th style="text-align: left">Notes</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td style="text-align: left"><strong>Exact Match</strong></td>
          <td style="text-align: left"><strong>96.90%</strong></td>
          <td style="text-align: left">76%</td>
          <td style="text-align: left">Tested on the provided test set</td>
      </tr>
  </tbody>
</table>
<h3 id="hardware">Hardware</h3>
<ul>
<li><strong>CPU</strong>: Intel(R) Xeon(R) Gold 6130 (40 GB RAM).</li>
<li><strong>GPU</strong>: NVIDIA Tesla V100 (32 GB video memory).</li>
<li><strong>Framework</strong>: PyTorch 1.9.1.</li>
<li><strong>Training Configuration</strong>:
<ul>
<li>Optimizer: Adam (learning rate 1e-4).</li>
<li>Batch size: 32.</li>
<li>Epochs: 100.</li>
</ul>
</li>
</ul>
<hr>
<h2 id="paper-information">Paper Information</h2>
<p><strong>Citation</strong>: Ouyang, H., Liu, W., Tao, J., et al. (2024). ChemReco: automated recognition of hand-drawn carbon-hydrogen-oxygen structures using deep learning. <em>Scientific Reports</em>, 14, 17126. <a href="https://doi.org/10.1038/s41598-024-67496-7">https://doi.org/10.1038/s41598-024-67496-7</a></p>
<p><strong>Publication</strong>: Scientific Reports 2024</p>
<p><strong>Additional Resources</strong>:</p>
<ul>
<li><a href="https://github.com/a-die/hdr-DeepLearning">Official Code Repository</a></li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bibtex" data-lang="bibtex"><span style="display:flex;"><span><span style="color:#a6e22e">@article</span>{ouyangChemRecoAutomatedRecognition2024,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">title</span> = <span style="color:#e6db74">{ChemReco: Automated Recognition of Hand-Drawn Carbon--Hydrogen--Oxygen Structures Using Deep Learning}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">author</span> = <span style="color:#e6db74">{Ouyang, Hengjie and Liu, Wei and Tao, Jiajun and Luo, Yanghong and Zhang, Wanjia and Zhou, Jiayu and Geng, Shuqi and Zhang, Chengpeng}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">journal</span> = <span style="color:#e6db74">{Scientific Reports}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">volume</span> = <span style="color:#e6db74">{14}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">number</span> = <span style="color:#e6db74">{1}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">pages</span> = <span style="color:#e6db74">{17126}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">year</span> = <span style="color:#e6db74">{2024}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">publisher</span> = <span style="color:#e6db74">{Nature Publishing Group}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">doi</span> = <span style="color:#e6db74">{10.1038/s41598-024-67496-7}</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div>]]></content:encoded></item><item><title>AtomLenz: Atom-Level OCSR with Limited Supervision</title><link>https://hunterheidenreich.com/notes/chemistry/optical-structure-recognition/hand-drawn/atomlenz/</link><pubDate>Fri, 19 Dec 2025 00:00:00 +0000</pubDate><guid>https://hunterheidenreich.com/notes/chemistry/optical-structure-recognition/hand-drawn/atomlenz/</guid><description>Weakly supervised OCSR framework combining object detection and graph construction to recognize chemical structures from hand-drawn images using SMILES.</description><content:encoded><![CDATA[<h2 id="dual-contribution-method-and-data-resource">Dual Contribution: Method and Data Resource</h2>
<p>The paper proposes an architecture (AtomLenz) and training framework (ProbKT* + Edit-Correction) to solve the problem of Optical Chemical Structure Recognition (OCSR) in data-sparse domains. It also releases a curated, relabeled dataset of hand-drawn molecules with atom-level bounding box annotations.</p>
<h2 id="overcoming-annotation-bottlenecks-in-ocsr">Overcoming Annotation Bottlenecks in OCSR</h2>
<p>Optical Chemical Structure Recognition (OCSR) is critical for digitizing chemical literature and lab notes. However, existing methods face three main limitations:</p>
<ol>
<li><strong>Generalization Limits:</strong> They struggle with sparse or stylistically unique domains, such as hand-drawn images, where massive datasets for pretraining are unavailable.</li>
<li><strong>Annotation Cost:</strong> &ldquo;Atom-level&rdquo; methods (which detect individual atoms and bonds) require expensive bounding box annotations, which are rarely available for real-world sketch data.</li>
<li><strong>Lack of Interpretability/Localization:</strong> Pure &ldquo;Image-to-SMILES&rdquo; models (like DECIMER) work well but fail to localize the atoms or bonds in the original image, limiting human-in-the-loop review and mechanistic interpretability.</li>
</ol>
<h2 id="atomlenz-probkt-and-graph-edit-correction">AtomLenz, ProbKT*, and Graph Edit-Correction</h2>
<p>The core contribution is <strong>AtomLenz</strong>, an OCSR framework that achieves atom-level entity detection using <strong>only SMILES supervision</strong> on target domains. The authors construct an explicit object detection pipeline using Faster R-CNN trained via a composite multi-task loss. The objective aims to optimize a multi-class log loss $L_{cls}$ for predicted class $\hat{c}$ and a regression loss $L_{reg}$ for predicted bounding box coordinates $\hat{b}$:</p>
<p>$$ \mathcal{L} = L_{cls}(c, \hat{c}) + L_{reg}(b, \hat{b}) $$</p>
<p>To bridge the gap between image inputs and the weakly supervised SMILES labels, the system leverages:</p>
<ul>
<li><em><em>ProbKT</em> (Probabilistic Knowledge Transfer):</em>* Uses probabilistic logic and Hungarian matching to align predicted objects with the &ldquo;ground truth&rdquo; derived from the SMILES strings, enabling backpropagation without explicit bounding boxes.</li>
<li><strong>Graph Edit-Correction:</strong> Generates pseudo-labels by solving an optimization problem that finds the smallest edit on the predicted graph such that the corrected graph and the ground-truth SMILES graph become isomorphic, which forces fine-tuning on less frequent atom types. The combination of ProbKT* and Edit-Correction is abbreviated as <strong>EditKT</strong>*.</li>
<li><strong>ChemExpert:</strong> A chemically sound ensemble strategy that cascades predictions from multiple models (e.g., passing through DECIMER, then AtomLenz), halting at the first output that clears basic RDKit chemical validity checks.</li>
</ul>
<h2 id="data-efficiency-and-domain-adaptation-experiments">Data Efficiency and Domain Adaptation Experiments</h2>
<p>The authors evaluated the model specifically on domain adaptation and sample efficiency, treating hand-drawn molecules as the primary low-data target distribution:</p>
<ul>
<li><strong>Pretraining:</strong> Initially trained on ~214k synthetic images from ChEMBL explicitly labeled with bounding boxes (generated via RDKit).</li>
<li><strong>Target Domain Adaptation:</strong> Fine-tuned on the Brinkhaus hand-drawn dataset (4,070 images) using purely SMILES supervision.</li>
<li><strong>Evaluation Sets:</strong>
<ul>
<li><strong>Hand-drawn test set</strong>: 1,018 images.</li>
<li><strong>ChemPix</strong>: 614 out-of-domain hand-drawn images.</li>
<li><strong>Atom Localization set</strong>: 1,000 synthetic images to evaluate precise bounding box capabilities.</li>
</ul>
</li>
<li><strong>Baselines:</strong> Compared against leading OCSR methods, including DECIMER (v2.2.0), Img2Mol, MolScribe, ChemGrapher, and OSRA.</li>
</ul>
<h2 id="state-of-the-art-ensembles-vs-standalone-limitations">State-of-the-Art Ensembles vs. Standalone Limitations</h2>
<ul>
<li><strong>SOTA Ensemble Performance:</strong> The <strong>ChemExpert</strong> module (combining AtomLenz and DECIMER) achieved state-of-the-art accuracy on both hand-drawn (63.5%) and ChemPix (51.8%) test sets.</li>
<li><strong>Data Efficiency under Bottleneck Regimes:</strong> AtomLenz effectively bypassed the massive data constraints of competing models. When all methods were retrained from scratch on the same 4,070-sample hand-drawn training set (enriched with atom-level annotations from EditKT*), AtomLenz achieved 33.8% exact accuracy, outperforming baselines like Img2Mol (0.0%), MolScribe (1.3%), and DECIMER (0.1%), illustrating its sample efficiency.</li>
<li><strong>Localization Success:</strong> The base framework achieved strong localization (mAP 0.801), a capability not provided by end-to-end transformers like DECIMER.</li>
<li><strong>Methodological Tradeoffs:</strong> While AtomLenz is highly sample efficient, its standalone performance when fine-tuned on the target domain (33.8% accuracy) underperforms fine-tuned models trained on larger datasets like DECIMER (62.2% accuracy). AtomLenz achieves state-of-the-art results primarily when deployed as part of the ChemExpert ensemble alongside DECIMER, since errors from the two approaches tend to occur on different samples, allowing them to complement each other.</li>
</ul>
<hr>
<h2 id="reproducibility-details">Reproducibility Details</h2>
<h3 id="artifacts">Artifacts</h3>
<table>
  <thead>
      <tr>
          <th style="text-align: left">Artifact</th>
          <th style="text-align: left">Type</th>
          <th style="text-align: left">License</th>
          <th style="text-align: left">Notes</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td style="text-align: left"><a href="https://github.com/molden/atomlenz">Official Repository (AtomLenz)</a></td>
          <td style="text-align: left">Code</td>
          <td style="text-align: left">MIT</td>
          <td style="text-align: left">Complete pipeline for AtomLenz, ProbKT*, and Graph Edit-Correction.</td>
      </tr>
      <tr>
          <td style="text-align: left"><a href="https://github.com/molden/atomlenz/tree/main/models">Pre-trained Models</a></td>
          <td style="text-align: left">Model</td>
          <td style="text-align: left">MIT</td>
          <td style="text-align: left">Downloadable weights for Faster R-CNN detection backbones.</td>
      </tr>
      <tr>
          <td style="text-align: left"><a href="https://dx.doi.org/10.6084/m9.figshare.24599412">Hand-drawn Dataset (Brinkhaus)</a></td>
          <td style="text-align: left">Dataset</td>
          <td style="text-align: left">Unknown</td>
          <td style="text-align: left">Images and SMILES used for target domain fine-tuning and evaluation.</td>
      </tr>
      <tr>
          <td style="text-align: left"><a href="https://dx.doi.org/10.6084/m9.figshare.24599172">Relabeled Hand-drawn Dataset</a></td>
          <td style="text-align: left">Dataset</td>
          <td style="text-align: left">Unknown</td>
          <td style="text-align: left">1,417 images with bounding box annotations generated via EditKT*.</td>
      </tr>
      <tr>
          <td style="text-align: left"><a href="https://huggingface.co/spaces/moldenhof/atomlenz">AtomLenz Web Demo</a></td>
          <td style="text-align: left">Other</td>
          <td style="text-align: left">Unknown</td>
          <td style="text-align: left">Interactive Hugging Face space for testing model inference.</td>
      </tr>
  </tbody>
</table>
<h3 id="data">Data</h3>
<p>The study utilizes a mix of large synthetic datasets and smaller curated hand-drawn datasets.</p>
<table>
  <thead>
      <tr>
          <th>Purpose</th>
          <th>Dataset</th>
          <th>Size</th>
          <th>Notes</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>Pretraining</strong></td>
          <td>Synthetic ChEMBL</td>
          <td>~214,000</td>
          <td>Generated via RDKit/Indigo. Annotated with atoms, bonds, charges, stereocenters.</td>
      </tr>
      <tr>
          <td><strong>Fine-tuning</strong></td>
          <td>Hand-drawn (Brinkhaus)</td>
          <td>4,070</td>
          <td>Used for weakly supervised adaptation (SMILES only).</td>
      </tr>
      <tr>
          <td><strong>Evaluation</strong></td>
          <td>Hand-drawn Test</td>
          <td>1,018</td>
          <td></td>
      </tr>
      <tr>
          <td><strong>Evaluation</strong></td>
          <td>ChemPix</td>
          <td>614</td>
          <td>Out-of-distribution hand-drawn images.</td>
      </tr>
      <tr>
          <td><strong>Evaluation</strong></td>
          <td>Atom Localization</td>
          <td>1,000</td>
          <td>Synthetic images with ground truth bounding boxes.</td>
      </tr>
  </tbody>
</table>
<h3 id="algorithms">Algorithms</h3>
<ul>
<li><strong>Molecular Graph Constructor (Algorithm 1):</strong> A rule-based system to assemble the graph from detected objects:
<ol>
<li><strong>Filtering:</strong> Removes overlapping atom boxes (IoU threshold).</li>
<li><strong>Node Creation:</strong> Merges overlapping charge and stereocenter objects with their corresponding atom objects.</li>
<li><strong>Edge Creation:</strong> Iterates over bond objects; if a bond overlaps with exactly two atoms, an edge is added. If &gt;2, it selects the most probable pair.</li>
<li><strong>Validation:</strong> Checks valency constraints; removes bonds iteratively if constraints are violated.</li>
</ol>
</li>
<li><strong>Weakly Supervised Training:</strong>
<ul>
<li><strong>ProbKT*:</strong> Uses Hungarian matching to align predicted objects with the &ldquo;ground truth&rdquo; implied by the SMILES string, allowing backpropagation without explicit boxes.</li>
<li><strong>Graph Edit-Correction:</strong> Finds the smallest edit on the predicted graph such that the corrected and true SMILES graphs become isomorphic, then uses the correction to generate pseudo-labels for retraining.</li>
</ul>
</li>
</ul>
<h3 id="models">Models</h3>
<ul>
<li><strong>Object Detection Backbone:</strong> <strong>Faster R-CNN</strong>.
<ul>
<li>Four distinct models are trained for different entity types: Atoms ($O^a$), Bonds ($O^b$), Charges ($O^c$), and Stereocenters ($O^s$).</li>
<li><strong>Loss Function:</strong> Multi-task loss combining Multi-class Log Loss ($L_{cls}$) and Regression Loss ($L_{reg}$).</li>
</ul>
</li>
<li><strong>ChemExpert:</strong> An ensemble wrapper that prioritizes models based on user preference (e.g., DECIMER first, then AtomLenz). It accepts the first prediction that passes RDKit chemical validity checks.</li>
</ul>
<h3 id="evaluation">Evaluation</h3>
<p>Primary metrics focused on structural correctness and localization accuracy.</p>
<table>
  <thead>
      <tr>
          <th>Metric</th>
          <th>Value (Hand-drawn)</th>
          <th>Baseline (DECIMER FT)</th>
          <th>Notes</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>Accuracy (T=1)</strong></td>
          <td>33.8% (AtomLenz+EditKT*)</td>
          <td>62.2%</td>
          <td>Exact ECFP6 fingerprint match.</td>
      </tr>
      <tr>
          <td><strong>Tanimoto Sim.</strong></td>
          <td>0.484</td>
          <td>0.727</td>
          <td>Average similarity.</td>
      </tr>
      <tr>
          <td><strong>mAP</strong></td>
          <td>0.801</td>
          <td>N/A</td>
          <td>Localization accuracy (IoU 0.05-0.35).</td>
      </tr>
      <tr>
          <td><strong>Ensemble Acc.</strong></td>
          <td><strong>63.5%</strong></td>
          <td>62.2%</td>
          <td>ChemExpert (DECIMER + AtomLenz).</td>
      </tr>
  </tbody>
</table>
<h3 id="hardware">Hardware</h3>
<ul>
<li><strong>Compute:</strong> Experiments utilized the Flemish Supercomputer Center (VSC) resources.</li>
<li><strong>Note:</strong> Specific GPU models (e.g., A100/V100) are not explicitly detailed in the text, but Faster R-CNN training is standard on consumer or enterprise GPUs.</li>
</ul>
<hr>
<h2 id="paper-information">Paper Information</h2>
<p><strong>Citation</strong>: Oldenhof, M., De Brouwer, E., Arany, Á., &amp; Moreau, Y. (2024). Atom-Level Optical Chemical Structure Recognition with Limited Supervision. In <em>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</em>, 2024.</p>
<p><strong>Publication venue/year</strong>: CVPR 2024</p>
<p><strong>Additional Resources</strong>:</p>
<ul>
<li><a href="https://github.com/molden/atomlenz">Official Repository</a></li>
<li><a href="https://dx.doi.org/10.6084/m9.figshare.24599412">Hand-drawn Dataset on Figshare</a></li>
</ul>
<p><strong>BibTeX</strong>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bibtex" data-lang="bibtex"><span style="display:flex;"><span><span style="color:#a6e22e">@inproceedings</span>{oldenhofAtomLevelOpticalChemical2024,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">title</span> = <span style="color:#e6db74">{Atom-Level Optical Chemical Structure Recognition with Limited Supervision}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">author</span> = <span style="color:#e6db74">{Oldenhof, Martijn and De Brouwer, Edward and Arany, {\&#39;A}d{\&#39;a}m and Moreau, Yves}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">booktitle</span> = <span style="color:#e6db74">{Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">year</span> = <span style="color:#e6db74">{2024}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">eprint</span> = <span style="color:#e6db74">{2404.01743}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">archiveprefix</span> = <span style="color:#e6db74">{arXiv}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">primaryclass</span> = <span style="color:#e6db74">{cs.CV}</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div>]]></content:encoded></item><item><title>Handwritten Chemical Structure Recognition with RCGD</title><link>https://hunterheidenreich.com/notes/chemistry/optical-structure-recognition/hand-drawn/hu-handwritten-rcgd-2023/</link><pubDate>Thu, 18 Dec 2025 00:00:00 +0000</pubDate><guid>https://hunterheidenreich.com/notes/chemistry/optical-structure-recognition/hand-drawn/hu-handwritten-rcgd-2023/</guid><description>An end-to-end framework (RCGD) and unambiguous markup language (SSML) for recognizing complex handwritten chemical structures with guided graph traversal.</description><content:encoded><![CDATA[<h2 id="contribution-and-methodological-framework">Contribution and Methodological Framework</h2>
<p>This is primarily a <strong>Method</strong> paper with a significant <strong>Resource</strong> component.</p>
<ul>
<li><strong>Method</strong>: It proposes a novel architectural framework (<strong>RCGD</strong>) and a new representation syntax (<strong>SSML</strong>) to solve the specific problem of handwritten chemical structure recognition.</li>
<li><strong>Resource</strong>: It introduces a new benchmark dataset, <strong>EDU-CHEMC</strong>, containing 50,000 handwritten images to address the lack of public data in this domain.</li>
</ul>
<h2 id="the-ambiguity-of-handwritten-chemical-structures">The Ambiguity of Handwritten Chemical Structures</h2>
<p>Recognizing handwritten chemical structures is significantly harder than printed ones due to:</p>
<ol>
<li><strong>Inherent Ambiguity</strong>: Handwritten atoms and bonds vary greatly in appearance.</li>
<li><strong>Projection Complexity</strong>: Converting 2D projected layouts (like Natta or Fischer projections) into linear strings is difficult.</li>
<li><strong>Limitations of Existing Formats</strong>: Standard formats like SMILES require domain knowledge (valence rules) and have a high semantic gap with the visual image. They often fail to represent &ldquo;invalid&rdquo; structures commonly found in educational/student work.</li>
</ol>
<h2 id="bridging-the-semantic-gap-with-ssml-and-rcgd">Bridging the Semantic Gap with SSML and RCGD</h2>
<p>The paper introduces two core contributions to bridge the semantic gap between image and markup:</p>
<ol>
<li>
<p><strong>Structure-Specific Markup Language (SSML)</strong>: An extension of Chemfig that provides an unambiguous, visual-based graph representation. Unlike SMILES, it describes <em>how to draw</em> the molecule step-by-step, making it easier for models to learn visual alignments. It supports &ldquo;reconnection marks&rdquo; to handle cyclic structures explicitly.</p>
</li>
<li>
<p><strong>Random Conditional Guided Decoder (RCGD)</strong>: A decoder that treats recognition as a graph traversal problem. It introduces three novel mechanisms:</p>
<ul>
<li><strong>Conditional Attention Guidance</strong>: Uses branch angle directions to guide the attention mechanism, preventing the model from getting lost in complex structures.</li>
<li><strong>Memory Classification</strong>: A module that explicitly stores and classifies &ldquo;unexplored&rdquo; branch points to handle ring closures (reconnections).</li>
<li><strong>Path Selection</strong>: A training strategy that randomly samples traversal paths to prevent overfitting to a specific serialization order.</li>
</ul>
</li>
</ol>
<h2 id="experimental-setup-and-baselines">Experimental Setup and Baselines</h2>
<p><strong>Datasets</strong>:</p>
<ul>
<li><strong>Mini-CASIA-CSDB</strong> (Printed): A subset of 97,309 printed molecular structure images, upscaled to $500 \times 500$ resolution.</li>
<li><strong>EDU-CHEMC</strong> (Handwritten): A new dataset of 52,987 images collected from educational settings (cameras, scanners, screens), including erroneous/non-existent structures.</li>
</ul>
<p><strong>Baselines</strong>:</p>
<ul>
<li>Compared against standard <strong>String Decoders (SD)</strong> (based on DenseWAP), tested with both SMILES and SSML on Mini-CASIA-CSDB and exclusively with SSML on EDU-CHEMC.</li>
<li>Compared against <strong>BTTR</strong> and <strong>ABM</strong> (recent mathematical expression recognition models) adapted for the chemical structure task, both using SSML on EDU-CHEMC.</li>
<li>On Mini-CASIA-CSDB, also compared against <strong>WYGIWYS</strong> (a SMILES-based string decoder at 300x300 resolution).</li>
</ul>
<p><strong>Ablation Studies</strong>:</p>
<ul>
<li>Evaluated the impact of removing Path Selection (PS) and Memory Classification (MC) mechanisms on EDU-CHEMC.</li>
<li>Tested robustness to image rotation ($180^{\circ}$) on Mini-CASIA-CSDB.</li>
</ul>
<h2 id="recognition-performance-and-robustness">Recognition Performance and Robustness</h2>
<ul>
<li><strong>Superiority of SSML</strong>: Models trained with SSML significantly outperformed those trained with SMILES (92.09% vs 81.89% EM on printed data) due to reduced semantic gap.</li>
<li><strong>Best Performance</strong>: RCGD achieved the highest Exact Match (EM) scores on both datasets:
<ul>
<li><strong>Mini-CASIA-CSDB</strong>: 95.01% EM.</li>
<li><strong>EDU-CHEMC</strong>: 62.86% EM.</li>
</ul>
</li>
<li><strong>EDU-CHEMC Baselines</strong>: On the handwritten dataset, SD (DenseWAP) achieved 61.35% EM, outperforming both BTTR (58.21% EM) and ABM (58.78% EM). The authors note that BTTR and ABM&rsquo;s reverse training mode, which helps in regular formula recognition, does not transfer well to graph-structured molecular data.</li>
<li><strong>Ablation Results</strong> (Table 5, EDU-CHEMC): Removing Path Selection alone dropped EM from 62.86% to 62.15%. Removing both Path Selection and Memory Classification dropped EM further to 60.31%, showing that memory classification has a larger impact.</li>
<li><strong>Robustness</strong>: RCGD showed minimal performance drop (0.85%) on rotated images compared to SMILES-based methods (10.36% drop). The SD with SSML dropped by 2.19%, confirming that SSML itself improves rotation invariance.</li>
<li><strong>Educational Utility</strong>: The method can recognize and reconstruct chemically invalid structures (e.g., a Carbon atom with 5 bonds), making it applicable for correcting and revising handwritten answers in chemistry education.</li>
</ul>
<hr>
<h2 id="reproducibility-details">Reproducibility Details</h2>
<h3 id="data">Data</h3>
<p><strong>1. EDU-CHEMC (Handwritten)</strong></p>
<ul>
<li><strong>Total Size</strong>: 52,987 images.</li>
<li><strong>Splits</strong>: Training (48,998), Validation (999), Test (2,992).</li>
<li><strong>Characteristics</strong>: Real-world educational data, mixture of isolated molecules and reaction equations, includes invalid chemical structures.</li>
</ul>
<p><strong>2. Mini-CASIA-CSDB (Printed)</strong></p>
<ul>
<li><strong>Total Size</strong>: 97,309 images.</li>
<li><strong>Splits</strong>: Training (80,781), Validation (8,242), Test (8,286).</li>
<li><strong>Preprocessing</strong>: Original $300 \times 300$ images were upscaled to $500 \times 500$ RGB to resolve blurring issues.</li>
</ul>
<h3 id="algorithms">Algorithms</h3>
<p><strong>1. SSML Generation</strong></p>
<p>To convert a molecular graph to SSML:</p>
<ol>
<li><strong>Traverse</strong>: Start from the left-most atom.</li>
<li><strong>Bonds/Atoms</strong>: Output atom text and bond format <code>&lt;bond&gt;[:&lt;angle&gt;]</code>.</li>
<li><strong>Branches</strong>: At branch points, use phantom symbols <code>(</code> and <code>)</code> to enclose branches, ordered by ascending bond angle.</li>
<li><strong>Reconnections</strong>: Use <code>?[tag]</code> and <code>?[tag, bond]</code> to mark start/end of ring closures.</li>
</ol>
<p><strong>2. RCGD Specifics</strong></p>
<ul>
<li><strong>RCGD-SSML</strong>: Modified version of SSML for the decoder. Removes <code>(</code> <code>)</code> delimiters; adds <code>\eob</code> (end of branch). Maintains a dynamic <strong>Branch Angle Set ($M$)</strong>.</li>
<li><strong>Path Selection</strong>: During training, when multiple branches exist in $M$, the model randomly selects one to traverse next. During inference, it uses beam search to score candidate paths.</li>
<li><strong>Loss Function</strong>:
$$
\begin{aligned}
L_{\text{total}} = L_{\text{ce}} + L_{\text{bc}}
\end{aligned}
$$
<ul>
<li>$L_{\text{ce}}$: Cross-entropy loss for character sequence generation.</li>
<li>$L_{\text{bc}}$: Multi-label classification loss for the memory module (predicting reconnection bond types for stored branch states).</li>
</ul>
</li>
</ul>
<h3 id="models">Models</h3>
<p><strong>Encoder</strong>: DenseNet</p>
<ul>
<li><strong>Structure</strong>: 3 dense blocks.</li>
<li><strong>Growth Rate</strong>: 24.</li>
<li><strong>Depth</strong>: 32 per block.</li>
<li><strong>Output</strong>: High-dimensional feature map $x \in \mathbb{R}^{d_x \times h \times w}$.</li>
</ul>
<p><strong>Decoder</strong>: GRU with Attention</p>
<ul>
<li><strong>Hidden State Dimension</strong>: 256.</li>
<li><strong>Embedding Dimension</strong>: 256.</li>
<li><strong>Attention Projection</strong>: 128.</li>
<li><strong>Memory Classification Projection</strong>: 256.</li>
</ul>
<p><strong>Training Config</strong>:</p>
<ul>
<li><strong>Optimizer</strong>: Adam.</li>
<li><strong>Learning Rate</strong>: 2e-4 with multi-step decay (gamma 0.5).</li>
<li><strong>Dropout</strong>: 15%.</li>
<li><strong>Strategy</strong>: Teacher-forcing used for validation selection.</li>
</ul>
<h3 id="evaluation">Evaluation</h3>
<p><strong>Metrics</strong>:</p>
<ul>
<li><strong>Exact Match (EM)</strong>: Percentage of samples where the predicted graph structure perfectly matches the label. For SMILES, string comparison; for SSML, converted to graph for isomorphism check.</li>
<li><strong>Structure EM</strong>: Auxiliary metric for samples with mixed content (text + molecules), counting samples where <em>all</em> molecular structures are correct.</li>
</ul>
<p><strong>Artifacts</strong>:</p>
<table>
  <thead>
      <tr>
          <th>Artifact</th>
          <th>Type</th>
          <th>License</th>
          <th>Notes</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><a href="https://github.com/iFLYTEK-CV/EDU-CHEMC">EDU-CHEMC</a></td>
          <td>Dataset</td>
          <td>Unknown</td>
          <td>Dataset annotations and download links (actual data hosted on Google Drive)</td>
      </tr>
  </tbody>
</table>
<p><strong>Missing Components</strong>:</p>
<ul>
<li>No training or inference code is publicly released; only the dataset is available.</li>
<li>Pre-trained model weights are not provided.</li>
</ul>
<hr>
<h2 id="paper-information">Paper Information</h2>
<p><strong>Citation</strong>: Hu, J., Wu, H., Chen, M., Liu, C., Wu, J., Yin, S., Yin, B., Yin, B., Liu, C., Du, J., &amp; Dai, L. (2023). Handwritten Chemical Structure Image to Structure-Specific Markup Using Random Conditional Guided Decoder. <em>Proceedings of the 31st ACM International Conference on Multimedia</em> (pp. 8114-8124). <a href="https://doi.org/10.1145/3581783.3612573">https://doi.org/10.1145/3581783.3612573</a></p>
<p><strong>Publication</strong>: ACM Multimedia 2023</p>
<p><strong>Additional Resources</strong>:</p>
<ul>
<li><a href="https://github.com/iFLYTEK-CV/EDU-CHEMC">GitHub Repository / EDU-CHEMC Dataset</a></li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bibtex" data-lang="bibtex"><span style="display:flex;"><span><span style="color:#a6e22e">@inproceedings</span>{huHandwrittenChemicalStructure2023,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">title</span> = <span style="color:#e6db74">{Handwritten Chemical Structure Image to Structure-Specific Markup Using Random Conditional Guided Decoder}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">booktitle</span> = <span style="color:#e6db74">{Proceedings of the 31st ACM International Conference on Multimedia}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">author</span> = <span style="color:#e6db74">{Hu, Jinshui and Wu, Hao and Chen, Mingjun and Liu, Chenyu and Wu, Jiajia and Yin, Shi and Yin, Baocai and Yin, Bing and Liu, Cong and Du, Jun and Dai, Lirong}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">year</span> = <span style="color:#e6db74">{2023}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">month</span> = oct,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">pages</span> = <span style="color:#e6db74">{8114--8124}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">publisher</span> = <span style="color:#e6db74">{ACM}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">address</span> = <span style="color:#e6db74">{Ottawa ON Canada}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">doi</span> = <span style="color:#e6db74">{10.1145/3581783.3612573}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">isbn</span> = <span style="color:#e6db74">{979-8-4007-0108-5}</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div>]]></content:encoded></item><item><title>ChemPix: Hand-Drawn Hydrocarbon Structure Recognition</title><link>https://hunterheidenreich.com/notes/chemistry/optical-structure-recognition/hand-drawn/chempix/</link><pubDate>Thu, 18 Dec 2025 00:00:00 +0000</pubDate><guid>https://hunterheidenreich.com/notes/chemistry/optical-structure-recognition/hand-drawn/chempix/</guid><description>Deep learning framework using CNN-LSTM image captioning to convert hand-drawn hydrocarbon structures into SMILES strings with 76% accuracy.</description><content:encoded><![CDATA[<h2 id="paper-classification-and-core-contribution">Paper Classification and Core Contribution</h2>
<p>This is primarily a <strong>Method</strong> paper, with a secondary contribution as a <strong>Resource</strong> paper.</p>
<p>The paper&rsquo;s core contribution is the <strong>ChemPix architecture and training strategy</strong> using neural image captioning (CNN-LSTM) to convert hand-drawn chemical structures to SMILES. The extensive ablation studies on synthetic data generation (augmentation, degradation, backgrounds) and ensemble learning strategies confirm the methodological focus. The secondary resource contribution includes releasing a curated dataset of hand-drawn hydrocarbons and code for generating synthetic training data.</p>
<h2 id="the-structural-input-bottleneck-in-computational-chemistry">The Structural Input Bottleneck in Computational Chemistry</h2>
<p>Inputting molecular structures into computational chemistry software for quantum calculations is often a bottleneck, requiring domain expertise and cumbersome manual entry in drawing software. While optical chemical structure recognition (OCSR) tools exist, they typically struggle with the noise and variability of hand-drawn sketches. There is a practical need for a tool that allows chemists to simply photograph a hand-drawn sketch and immediately convert it into a machine-readable format (<a href="/notes/chemistry/molecular-representations/notations/smiles/">SMILES</a>), making computational workflows more accessible.</p>
<h2 id="cnn-lstm-image-captioning-and-synthetic-generalization">CNN-LSTM Image Captioning and Synthetic Generalization</h2>
<ol>
<li><strong>Image Captioning Paradigm</strong>: The authors treat the problem as <strong>neural image captioning</strong>, using an encoder-decoder (CNN-LSTM) framework to &ldquo;translate&rdquo; an image directly to a SMILES string. This avoids the complexity of explicit atom/bond detection and graph assembly.</li>
<li><strong>Synthetic Data Engineering</strong>: The paper introduces a rigorous synthetic data generation pipeline that transforms clean RDKit-generated images into &ldquo;pseudo-hand-drawn&rdquo; images via randomized backgrounds, degradation, and heavy augmentation. This allows the model to achieve &gt;50% accuracy on real hand-drawn data without ever seeing it during training.</li>
<li><strong>Ensemble Uncertainty Estimation</strong>: The method utilizes a &ldquo;committee&rdquo; (ensemble) of networks to improve accuracy and estimate confidence based on vote agreement, providing users with reliability indicators for predictions.</li>
</ol>
<h2 id="extensive-ablation-and-real-world-evaluation">Extensive Ablation and Real-World Evaluation</h2>
<ol>
<li><strong>Ablation Studies on Data Pipeline</strong>: The authors trained models on datasets generated at different stages of the pipeline (Clean RDKit $\rightarrow$ Augmented $\rightarrow$ Backgrounds $\rightarrow$ Degraded) to quantify the value of each transformation in bridging the synthetic-to-real domain gap.</li>
<li><strong>Sample Size Scaling</strong>: They analyzed performance scaling by training on synthetic dataset sizes ranging from 10,000 to 500,000 images to understand data requirements.</li>
<li><strong>Real-world Validation</strong>: The model was evaluated on a held-out test set of hand-drawn images collected via a custom web app, providing genuine out-of-distribution testing.</li>
<li><strong>Fine-tuning Experiments</strong>: Comparisons of synthetic-only training versus fine-tuning with a small fraction of real hand-drawn data to assess the value of limited real-world supervision.</li>
</ol>
<h2 id="state-of-the-art-hand-drawn-ocsr-performance">State-of-the-Art Hand-Drawn OCSR Performance</h2>
<ol>
<li>
<p><strong>Pipeline Efficacy</strong>: Augmentation and image degradation were the most critical factors for generalization, achieving over 50% accuracy on hand-drawn data when training with 500,000 synthetic images. Adding backgrounds had a negligible effect on accuracy compared to degradation.</p>
</li>
<li>
<p><strong>State-of-the-Art Performance</strong>: The final ensemble model (5 out of 17 trained NNs, selected for achieving &gt;50% individual accuracy) achieved <strong>76% accuracy</strong> (top-1) and <strong>85.5% accuracy</strong> (top-3) on the hand-drawn test set, a significant improvement over the best single model&rsquo;s 67.5%.</p>
</li>
<li>
<p><strong>Synthetic Generalization</strong>: A model trained on 500,000 synthetic images achieved &gt;50% accuracy on real hand-drawn data without any fine-tuning, validating the synthetic data generation strategy as a viable alternative to expensive manual labeling.</p>
</li>
<li>
<p><strong>Ensemble Benefits</strong>: The voting committee approach improved accuracy and provided interpretable uncertainty estimates through vote distributions. When all five committee members agree ($V=5$), the confidence value reaches 98%.</p>
</li>
</ol>
<h2 id="limitations">Limitations</h2>
<p>The authors acknowledge several limitations of the current system:</p>
<ul>
<li><strong>Hydrocarbons only</strong>: The model is restricted to hydrocarbon structures and does not handle heteroatoms or functional groups.</li>
<li><strong>No conjoined rings</strong>: Molecules with multiple conjoined rings are excluded due to limitations of RDKit&rsquo;s image generation, which depicts bridges differently from standard chemistry drawing conventions.</li>
<li><strong>Resonance hybrid notation</strong>: The network struggles with benzene rings drawn in the resonance hybrid style (with a circle) compared to the Kekule structure, since the RDKit training images use exclusively Kekule representations.</li>
<li><strong>Challenging backgrounds</strong>: Lined and squared paper increase recognition difficulty, and structures bleeding through from the opposite side of the page can confuse the network.</li>
</ul>
<hr>
<h2 id="reproducibility-details">Reproducibility Details</h2>
<h3 id="data">Data</h3>
<p>The study relies on two primary data sources: a massive synthetic dataset generated procedurally and a smaller collected dataset of real drawings.</p>
<table>
  <thead>
      <tr>
          <th>Purpose</th>
          <th>Dataset</th>
          <th>Size</th>
          <th>Notes</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>Training</strong></td>
          <td>Synthetic (RDKit)</td>
          <td>500,000 images</td>
          <td>Generated via RDKit with &ldquo;heavy&rdquo; augmentation: rotation ($0-360°$), blur, salt+pepper noise, and background texture addition.</td>
      </tr>
      <tr>
          <td><strong>Fine-tuning</strong></td>
          <td>Hand-Drawn (Real)</td>
          <td>613 images</td>
          <td>Crowdsourced via a web app from over 100 unique users; split into 200-image test set and 413 training/validation images.</td>
      </tr>
      <tr>
          <td><strong>Backgrounds</strong></td>
          <td>Texture Images</td>
          <td>1,052 images</td>
          <td>A pool of unlabeled texture photos (paper, desks, shadows) used to generate synthetic backgrounds.</td>
      </tr>
  </tbody>
</table>
<p><strong>Data Generation Parameters</strong>:</p>
<ul>
<li><strong>Augmentations</strong>: Rotation, Resize ($200-300px$), Blur, Dilate, Erode, Aspect Ratio, Affine transform ($\pm 20px$), Contrast, Quantize, Sharpness</li>
<li><strong>Backgrounds</strong>: Randomly translated $\pm 100$ pixels and reflected</li>
</ul>
<h3 id="algorithms">Algorithms</h3>
<p><strong>Ensemble Voting</strong><br>
A committee of networks casts votes for the predicted SMILES string. The final prediction is the one with the highest vote count. Validity of SMILES is checked using RDKit.</p>
<p><strong>Beam Search</strong><br>
Used in the decoding layer with a beam width of $k=5$ to explore multiple potential SMILES strings. It approximates the sequence $\mathbf{\hat{y}}$ that maximizes the joint probability:</p>
<p>$$ \mathbf{\hat{y}} = \arg\max_{\mathbf{y}} \sum_{t=1}^{T} \log P(y_t \mid y_{&lt;t}, \mathbf{x}) $$</p>
<p><strong>Optimization</strong>:</p>
<ul>
<li>
<p><strong>Optimizer</strong>: Adam</p>
</li>
<li>
<p><strong>Learning Rate</strong>: $1 \times 10^{-4}$</p>
</li>
<li>
<p><strong>Batch Size</strong>: 20</p>
</li>
<li>
<p><strong>Loss Function</strong>: Cross-entropy loss across the sequence of $T$ tokens, computed as:</p>
<p>$$ \mathcal{L} = -\sum_{t=1}^{T} \log P(y_t \mid y_{&lt;t}, \mathbf{x}) $$</p>
<p>where $\mathbf{x}$ is the image representation and $y_t$ is the predicted SMILES character. This is calculated as perplexity for validation.</p>
</li>
</ul>
<h3 id="models">Models</h3>
<p>The architecture is a standard image captioning model (Show, Attend and Tell style) adapted for chemical structures.</p>
<p><strong>Encoder (CNN)</strong>:</p>
<ul>
<li><strong>Input</strong>: 256x256 pixel PNG images</li>
<li><strong>Structure</strong>: 4 blocks of Conv2D + MaxPool
<ul>
<li>Block 1: 64 filters, (3,3) kernel</li>
<li>Block 2: 128 filters, (3,3) kernel</li>
<li>Block 3: 256 filters, (3,3) kernel</li>
<li>Block 4: 512 filters, (3,3) kernel</li>
</ul>
</li>
<li><strong>Activation</strong>: ReLU throughout</li>
</ul>
<p><strong>Decoder (LSTM)</strong>:</p>
<ul>
<li><strong>Hidden Units</strong>: 512</li>
<li><strong>Embedding Dimension</strong>: 80</li>
<li><strong>Attention</strong>: Mechanism with intermediary vector dimension of 512</li>
</ul>
<h3 id="evaluation">Evaluation</h3>
<ul>
<li><strong>Primary Metric</strong>: Exact SMILES match accuracy (character-by-character identity between predicted and ground truth SMILES)</li>
<li><strong>Perplexity</strong>: Used for saving model checkpoints (minimizing uncertainty)</li>
<li><strong>Top-k Accuracy</strong>: Reported for $k=1$ (76%) and $k=3$ (85.5%)</li>
</ul>
<h3 id="artifacts">Artifacts</h3>
<table>
  <thead>
      <tr>
          <th>Artifact</th>
          <th>Type</th>
          <th>License</th>
          <th>Notes</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><a href="https://github.com/mtzgroup/ChemPixCH">ChemPixCH</a></td>
          <td>Code + Dataset</td>
          <td>Apache-2.0</td>
          <td>Official implementation with synthetic data generation pipeline and collected hand-drawn dataset</td>
      </tr>
  </tbody>
</table>
<hr>
<h2 id="paper-information">Paper Information</h2>
<p><strong>Citation</strong>: Weir, H., Thompson, K., Woodward, A., Choi, B., Braun, A., &amp; Martínez, T. J. (2021). ChemPix: Automated Recognition of Hand-Drawn Hydrocarbon Structures Using Deep Learning. <em>Chemical Science</em>, 12(31), 10622-10633. <a href="https://doi.org/10.1039/D1SC02957F">https://doi.org/10.1039/D1SC02957F</a></p>
<p><strong>Publication</strong>: Chemical Science 2021</p>
<p><strong>Additional Resources</strong>:</p>
<ul>
<li><a href="https://github.com/mtzgroup/ChemPixCH">GitHub Repository</a></li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bibtex" data-lang="bibtex"><span style="display:flex;"><span><span style="color:#a6e22e">@article</span>{weir2021chempix,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">title</span>=<span style="color:#e6db74">{ChemPix: Automated Recognition of Hand-Drawn Hydrocarbon Structures Using Deep Learning}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">author</span>=<span style="color:#e6db74">{Weir, Hayley and Thompson, Keiran and Woodward, Amelia and Choi, Benjamin and Braun, Augustin and Mart{\&#39;i}nez, Todd J.}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">journal</span>=<span style="color:#e6db74">{Chemical Science}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">volume</span>=<span style="color:#e6db74">{12}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">number</span>=<span style="color:#e6db74">{31}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">pages</span>=<span style="color:#e6db74">{10622--10633}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">year</span>=<span style="color:#e6db74">{2021}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">publisher</span>=<span style="color:#e6db74">{Royal Society of Chemistry}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">doi</span>=<span style="color:#e6db74">{10.1039/D1SC02957F}</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div>]]></content:encoded></item><item><title>Handwritten Chemical Ring Recognition with Neural Networks</title><link>https://hunterheidenreich.com/notes/chemistry/optical-structure-recognition/hand-drawn/hewahi-ring-recognition-2008/</link><pubDate>Wed, 17 Dec 2025 00:00:00 +0000</pubDate><guid>https://hunterheidenreich.com/notes/chemistry/optical-structure-recognition/hand-drawn/hewahi-ring-recognition-2008/</guid><description>A two-phase Classifier-Recognizer neural network pipeline for recognizing 23 types of handwritten heterocyclic chemical rings, achieving ~94% accuracy.</description><content:encoded><![CDATA[<h2 id="contribution-recognition-architecture-for-heterocyclic-rings">Contribution: Recognition Architecture for Heterocyclic Rings</h2>
<p>This is a <strong>Method</strong> paper ($\Psi_{\text{Method}}$).</p>
<p>It proposes a specific algorithmic architecture (the &ldquo;Classifier-Recognizer Approach&rdquo;) to solve a pattern recognition problem. The rhetorical structure centers on defining three variations of a method, performing ablation-like comparisons between them (Whole Image vs. Lower Part), and demonstrating superior performance metrics (~94% accuracy) for the proposed technique.</p>
<h2 id="motivation-enabling-sketch-based-chemical-search">Motivation: Enabling Sketch-Based Chemical Search</h2>
<p>The authors identify a gap in existing OCR and handwriting recognition research, which typically focuses on alphanumeric characters or whole words.</p>
<ul>
<li><strong>Missing Capability</strong>: Recognition of specific <em>heterocyclic chemical rings</em> (23 types) had not been performed previously.</li>
<li><strong>Practical Utility</strong>: Existing chemical search engines require text-based queries (names); this work enables &ldquo;backward&rdquo; search where a user can draw a ring to find its information.</li>
<li><strong>Educational/Professional Aid</strong>: Useful for chemistry departments and mobile applications where chemists can sketch formulas on screens.</li>
</ul>
<h2 id="innovation-the-classifier-recognizer-pipeline">Innovation: The Classifier-Recognizer Pipeline</h2>
<p>The core novelty is the <strong>two-phase &ldquo;Classifier-Recognizer&rdquo; architecture</strong> designed to handle the visual similarity of heterocyclic rings:</p>
<ol>
<li><strong>Phase 1 (Classifier)</strong>: A neural network classifies the ring into one of four broad categories (S, N, O, Others) based solely on the <em>upper part</em> of the image (40x15 pixels).</li>
<li><strong>Phase 2 (Recognizer)</strong>: A class-specific neural network identifies the exact ring.</li>
<li><strong>Optimization</strong>: The most successful variation (&ldquo;Lower Part Image Recognizer with Half Size Grid&rdquo;) uses only the <em>lower part</em> of the image and <em>odd rows</em> (half-grid) to reduce input dimensionality and computation time while improving accuracy. This effectively subsamples the input grid matrix $M \in \mathbb{R}^{H \times W}$ to a reduced matrix $M_{\text{sub}}$:
$$ M_{\text{sub}} = { m_{i,j} \in M \mid i \text{ is odd} } $$</li>
</ol>
<h2 id="failed-preliminary-approaches">Failed Preliminary Approaches</h2>
<p>Before arriving at the Classifier-Recognizer architecture, the authors tried three simpler methods that all failed:</p>
<ol>
<li><strong>Ordinary NN</strong>: A single neural network with 1600 inputs (40x40 grid), 1600 hidden units, and 23 outputs. This standard approach achieved only 7% accuracy.</li>
<li><strong>Row/Column pixel counts</strong>: Using the number of black pixels per row and per column as features ($N_c + N_r$ inputs), which dramatically reduced dimensionality. This performed even worse, below 1% accuracy.</li>
<li><strong>Midline crossing count</strong>: Drawing a horizontal midline and counting the number of line crossings. This failed because the crossing count varies between writers for the same ring.</li>
</ol>
<p>These failures motivated the two-phase Classifier-Recognizer design.</p>
<h2 id="experimental-setup-and-network-variations">Experimental Setup and Network Variations</h2>
<p>The authors conducted a comparative study of three methodological variations:</p>
<ol>
<li><strong>Whole Image Recognizer</strong>: Uses the full image.</li>
<li><strong>Whole Image (Half Size Grid)</strong>: Uses only odd rows ($20 \times 40$ pixels).</li>
<li><strong>Lower Part (Half Size Grid)</strong>: Uses the lower part of the image with odd rows (the proposed method).</li>
</ol>
<p><strong>Setup</strong>:</p>
<ul>
<li><strong>Dataset</strong>: 23 types of heterocyclic rings.</li>
<li><strong>Training</strong>: 1500 samples (distributed across S, N, O, and Others classes).</li>
<li><strong>Testing</strong>: 1150 samples.</li>
<li><strong>Metric</strong>: Recognition accuracy (Performance %) and Error %.</li>
</ul>
<h2 id="results-high-accuracy-via-dimension-reduction">Results: High Accuracy via Dimension Reduction</h2>
<ul>
<li><strong>Superior Method</strong>: The &ldquo;Lower Part Image Recognizer with Half Size Grid&rdquo; achieved the best performance (~94% overall).</li>
<li><strong>High Classifier Accuracy</strong>: The first phase (classification into S/N/O/Other) achieves 100% accuracy for class S, 98.67% for O, 97.75% for N, and 97.67% for Others (Table 3).</li>
<li><strong>Class &lsquo;Others&rsquo; Difficulty</strong>: The &lsquo;Others&rsquo; class showed lower performance (~90-93%) compared to S/N/O due to the higher complexity and similarity of rings in that category.</li>
<li><strong>Efficiency</strong>: The half-grid approach reduced training time from ~53 hours (Whole Image) to ~35 hours (Lower Part Half Size Grid) while improving accuracy from 87% to 94%.</li>
</ul>
<p><strong>Training/Testing comparison across the three Classifier-Recognizer variations (Table 2)</strong>:</p>
<table>
  <thead>
      <tr>
          <th style="text-align: left">Method</th>
          <th style="text-align: left">Hidden Nodes</th>
          <th style="text-align: left">Iterations</th>
          <th style="text-align: left">Training Time (hrs)</th>
          <th style="text-align: left">Error</th>
          <th style="text-align: left">Performance</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td style="text-align: left">Whole Image</td>
          <td style="text-align: left">50</td>
          <td style="text-align: left">1000</td>
          <td style="text-align: left">~53</td>
          <td style="text-align: left">13.0%</td>
          <td style="text-align: left">87.0%</td>
      </tr>
      <tr>
          <td style="text-align: left">Whole Image (Half Grid)</td>
          <td style="text-align: left">50</td>
          <td style="text-align: left">1000</td>
          <td style="text-align: left">~41</td>
          <td style="text-align: left">9.0%</td>
          <td style="text-align: left">91.0%</td>
      </tr>
      <tr>
          <td style="text-align: left">Lower Part (Half Grid)</td>
          <td style="text-align: left">50</td>
          <td style="text-align: left">1000</td>
          <td style="text-align: left">~35</td>
          <td style="text-align: left">6.0%</td>
          <td style="text-align: left">94.0%</td>
      </tr>
  </tbody>
</table>
<hr>
<h2 id="reproducibility-details">Reproducibility Details</h2>
<h3 id="data">Data</h3>
<p>The dataset consists of handwritten samples of 23 specific heterocyclic rings.</p>
<table>
  <thead>
      <tr>
          <th style="text-align: left">Purpose</th>
          <th style="text-align: left">Dataset</th>
          <th style="text-align: left">Size</th>
          <th style="text-align: left">Notes</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td style="text-align: left"><strong>Training</strong></td>
          <td style="text-align: left">Heterocyclic Rings</td>
          <td style="text-align: left">1500 samples</td>
          <td style="text-align: left">Split: 300 (S), 400 (N), 400 (O), 400 (Others)</td>
      </tr>
      <tr>
          <td style="text-align: left"><strong>Testing</strong></td>
          <td style="text-align: left">Heterocyclic Rings</td>
          <td style="text-align: left">1150 samples</td>
          <td style="text-align: left">Split: 150 (S), 300 (O), 400 (N), 300 (Others)</td>
      </tr>
  </tbody>
</table>
<p><strong>Preprocessing Steps</strong>:</p>
<ol>
<li><strong>Monochrome Conversion</strong>: Convert image to monochrome bitmap.</li>
<li><strong>Grid Scaling</strong>: Convert drawing area (regardless of original size) to a fixed <strong>40x40</strong> grid.</li>
<li><strong>Bounding</strong>: Scale the ring shape itself to fit the 40x40 grid.</li>
</ol>
<h3 id="algorithms">Algorithms</h3>
<p><strong>The &ldquo;Lower Part with Half Size&rdquo; Pipeline</strong>:</p>
<ol>
<li><strong>Cut Point</strong>: A horizontal midline is defined; the algorithm separates the &ldquo;Upper Part&rdquo; and &ldquo;Lower Part&rdquo;.</li>
<li><strong>Phase 1 Input</strong>: The <strong>Upper Part</strong> (rows 0-15 approx, scaled) is fed to the Classifier NN to determine the class (S, N, O, or Others).</li>
<li><strong>Phase 2 Input</strong>:
<ul>
<li>For classes <strong>S, N, O</strong>: The <strong>Lower Part</strong> of the image is used.</li>
<li>For class <strong>Others</strong>: The <strong>Whole Ring</strong> is used.</li>
</ul>
</li>
<li><strong>Dimensionality Reduction</strong>: For the recognizer networks, only <strong>odd rows</strong> are used (effectively a 20x40 input grid) to reduce inputs from 1600 to 800.</li>
</ol>
<h3 id="models">Models</h3>
<p>The system uses multiple distinct Feed-Forward Neural Networks (Backpropagation is implied by &ldquo;training&rdquo; and &ldquo;epochs&rdquo; context, though not explicitly named as the algorithm):</p>
<ul>
<li><strong>Structure</strong>: 1 Classifier NN + 4 Recognizer NNs (one for each class).</li>
<li><strong>Hidden Layers</strong>: The preliminary &ldquo;ordinary method&rdquo; experiment used 1600 hidden units. The Classifier-Recognizer methods all used 50 hidden nodes per Table 2. The paper also notes that the ordinary approach tried various hidden layer sizes.</li>
<li><strong>Input Nodes</strong>:
<ul>
<li>Standard: 1600 (40x40).</li>
<li>Optimized: ~800 (20x40 via half-grid).</li>
</ul>
</li>
</ul>
<h3 id="evaluation">Evaluation</h3>
<p><strong>Classifier Phase Testing Results (Table 3)</strong>:</p>
<table>
  <thead>
      <tr>
          <th style="text-align: left">Class</th>
          <th style="text-align: left">Samples</th>
          <th style="text-align: left">Correct</th>
          <th style="text-align: left">Accuracy</th>
          <th style="text-align: left">Error</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td style="text-align: left"><strong>S</strong></td>
          <td style="text-align: left">150</td>
          <td style="text-align: left">150</td>
          <td style="text-align: left"><strong>100.00%</strong></td>
          <td style="text-align: left">0.00%</td>
      </tr>
      <tr>
          <td style="text-align: left"><strong>O</strong></td>
          <td style="text-align: left">300</td>
          <td style="text-align: left">296</td>
          <td style="text-align: left"><strong>98.67%</strong></td>
          <td style="text-align: left">1.33%</td>
      </tr>
      <tr>
          <td style="text-align: left"><strong>N</strong></td>
          <td style="text-align: left">400</td>
          <td style="text-align: left">391</td>
          <td style="text-align: left"><strong>97.75%</strong></td>
          <td style="text-align: left">2.25%</td>
      </tr>
      <tr>
          <td style="text-align: left"><strong>Others</strong></td>
          <td style="text-align: left">300</td>
          <td style="text-align: left">293</td>
          <td style="text-align: left"><strong>97.67%</strong></td>
          <td style="text-align: left">2.33%</td>
      </tr>
  </tbody>
</table>
<p><strong>Recognizer Phase Testing Results (Lower Part Image Recognizer with Half Size Grid, Table 4)</strong>:</p>
<table>
  <thead>
      <tr>
          <th style="text-align: left">Class</th>
          <th style="text-align: left">Samples</th>
          <th style="text-align: left">Correct</th>
          <th style="text-align: left">Accuracy</th>
          <th style="text-align: left">Error</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td style="text-align: left"><strong>S</strong></td>
          <td style="text-align: left">150</td>
          <td style="text-align: left">147</td>
          <td style="text-align: left"><strong>98.00%</strong></td>
          <td style="text-align: left">2.00%</td>
      </tr>
      <tr>
          <td style="text-align: left"><strong>O</strong></td>
          <td style="text-align: left">300</td>
          <td style="text-align: left">289</td>
          <td style="text-align: left"><strong>96.33%</strong></td>
          <td style="text-align: left">3.67%</td>
      </tr>
      <tr>
          <td style="text-align: left"><strong>N</strong></td>
          <td style="text-align: left">400</td>
          <td style="text-align: left">386</td>
          <td style="text-align: left"><strong>96.50%</strong></td>
          <td style="text-align: left">3.50%</td>
      </tr>
      <tr>
          <td style="text-align: left"><strong>Others</strong></td>
          <td style="text-align: left">300</td>
          <td style="text-align: left">279</td>
          <td style="text-align: left"><strong>93.00%</strong></td>
          <td style="text-align: left">7.00%</td>
      </tr>
      <tr>
          <td style="text-align: left"><strong>Overall</strong></td>
          <td style="text-align: left"><strong>1150</strong></td>
          <td style="text-align: left"><strong>-</strong></td>
          <td style="text-align: left"><strong>~94.0%</strong></td>
          <td style="text-align: left"><strong>-</strong></td>
      </tr>
  </tbody>
</table>
<h3 id="reproducibility-assessment">Reproducibility Assessment</h3>
<p>No source code, trained models, or datasets were released with this paper. The handwritten ring samples were collected by the authors, and the software described (a desktop application) is not publicly available. The neural network architecture details (50 hidden nodes, 1000 iterations) and preprocessing pipeline are described in sufficient detail for reimplementation, but reproducing results would require collecting a new handwritten dataset of heterocyclic rings.</p>
<p><strong>Status</strong>: Closed (no public code, data, or models).</p>
<hr>
<h2 id="paper-information">Paper Information</h2>
<p><strong>Citation</strong>: Hewahi, N., Nounou, M. N., Nassar, M. S., Abu-Hamad, M. I., &amp; Abu-Hamad, H. I. (2008). Chemical Ring Handwritten Recognition Based on Neural Networks. <em>Ubiquitous Computing and Communication Journal</em>, 3(3).</p>
<p><strong>Publication</strong>: Ubiquitous Computing and Communication Journal 2008</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bibtex" data-lang="bibtex"><span style="display:flex;"><span><span style="color:#a6e22e">@article</span>{hewahiCHEMICALRINGHANDWRITTEN2008,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">title</span> = <span style="color:#e6db74">{CHEMICAL RING HANDWRITTEN RECOGNITION BASED ON NEURAL NETWORKS}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">author</span> = <span style="color:#e6db74">{Hewahi, Nabil and Nounou, Mohamed N and Nassar, Mohamed S and Abu-Hamad, Mohamed I and Abu-Hamad, Husam I}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">year</span> = <span style="color:#e6db74">{2008}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">journal</span> = <span style="color:#e6db74">{Ubiquitous Computing and Communication Journal}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">volume</span> = <span style="color:#e6db74">{3}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">number</span> = <span style="color:#e6db74">{3}</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div>]]></content:encoded></item><item><title>Structural Analysis of Handwritten Chemical Formulas</title><link>https://hunterheidenreich.com/notes/chemistry/optical-structure-recognition/hand-drawn/ramel-handwritten-1999/</link><pubDate>Mon, 15 Dec 2025 00:00:00 +0000</pubDate><guid>https://hunterheidenreich.com/notes/chemistry/optical-structure-recognition/hand-drawn/ramel-handwritten-1999/</guid><description>A 1999 methodology for recognizing handwritten chemical structures using a structural graph representation and recursive specialists.</description><content:encoded><![CDATA[<h2 id="contribution-structural-approach-to-document-analysis">Contribution: Structural Approach to Document Analysis</h2>
<p><strong>Method</strong>.
This paper proposes a system architecture for document analysis. It introduces a specific pipeline (Global Perception followed by Incremental Extraction) and validates this strategy with recognition rates on specific tasks. The core contribution is the shift from bitmap-based processing to a <strong>structural graph representation</strong> of graphical primitives.</p>
<h2 id="motivation-overcoming-bitmap-limitations-in-freehand-drawings">Motivation: Overcoming Bitmap Limitations in Freehand Drawings</h2>
<ul>
<li><strong>Complexity of Freehand</strong>: Freehand drawings contain fluctuating lines and noise that make standard vectorization techniques difficult to apply directly.</li>
<li><strong>Limitation of Bitmap Analysis</strong>: Most existing systems at the time attempted to interpret the document by working directly on the static bitmap image throughout the process.</li>
<li><strong>Need for Context</strong>: Interpretation requires a dynamic resource that can evolve as knowledge is extracted (e.g., recognizing a polygon changes the context for its neighbors).</li>
</ul>
<h2 id="novelty-dynamic-structural-graphs-and-recursive-specialists">Novelty: Dynamic Structural Graphs and Recursive Specialists</h2>
<p>The authors propose a <strong>Structural Representation</strong> as the unique resource for interpretation.</p>
<ul>
<li><strong>Quadrilateral Primitives</strong>: The system builds Quadrilaterals (pairs of vectors) to represent thin shapes, which are robust to handwriting fluctuations.</li>
<li><strong>Structural Graph</strong>: These primitives are organized into a graph where arcs represent geometric relationships (T-junctions, L-junctions, parallels).</li>
<li><strong>Specialist Agents</strong>: Interpretation is driven by independent modules (specialists) that browse this graph recursively to identify high-level chemical entities like rings (polygons) or chains.</li>
</ul>
<h2 id="experimental-setup-and-outcomes">Experimental Setup and Outcomes</h2>
<ul>
<li><strong>Validation Set</strong>: The system was tested on 20 handwritten off-line documents containing chemical formulas at 300 dpi resolution.</li>
<li><strong>Text Database</strong>: A separate base of 328 models was used for the text recognition component.</li>
<li><strong>High Graphical Accuracy</strong>: The system achieved a $\approx 97%$ recognition rate for graphical parts (chemical elements like rings and bonds).</li>
<li><strong>Text Recognition</strong>: The text recognition module achieved a $\approx 93%$ success rate.</li>
<li><strong>Robustness</strong>: The structural graph approach successfully handled multiple liaisons, polygons, chains and allowed for the progressive construction of a solution consistent with the context.</li>
</ul>
<hr>
<h2 id="reproducibility-details">Reproducibility Details</h2>
<h3 id="data">Data</h3>
<table>
  <thead>
      <tr>
          <th>Purpose</th>
          <th>Dataset</th>
          <th>Size</th>
          <th>Notes</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Evaluation</td>
          <td>Handwritten Documents</td>
          <td>20 docs</td>
          <td>Off-line documents at 300 dpi</td>
      </tr>
      <tr>
          <td>Training</td>
          <td>Character Models</td>
          <td>328 models</td>
          <td>Used for the Pattern Matching text recognition base</td>
      </tr>
  </tbody>
</table>
<h3 id="algorithms">Algorithms</h3>
<p>The interpretation process is divided into two distinct phases:</p>
<p><strong>1. Global Perception (Graph Construction)</strong></p>
<ul>
<li><strong>Vectorization</strong>: Contour tracking produces a chain of vectors, which are simplified via iterative polygonal approximation until fusion stabilizes (2-5 iterations).</li>
<li><strong>Quadrilateral Formation</strong>: Vectors are paired to form quadrilaterals based on Euclidean distance and &ldquo;empirical&rdquo; alignment criteria.</li>
<li><strong>Graph Generation</strong>: Quadrilaterals become nodes. Arcs are created based on &ldquo;zones of influence&rdquo; and classified into 5 types: T-junction, Intersection (X), Parallel (//), L-junction, and Successive (S).</li>
<li><strong>Redraw Heuristic</strong>: A pre-processing step transforms T, X, and S junctions into L or // relations, as chemical drawings primarily consist of L-junctions and parallels.</li>
</ul>
<p><strong>2. Specialists (Interpretation)</strong></p>
<ul>
<li><strong>Liaison Specialist</strong>: Scans the graph for // arcs or quadrilaterals with free extremities to identify bonds.</li>
<li><strong>Polygon/Chain Specialist</strong>: Uses recursive <code>look-left</code> and <code>look-right</code> procedures. If a search returns to the start node after $n$ steps, a polygon is detected.</li>
<li><strong>Text Localization</strong>: Clusters &ldquo;short&rdquo; quadrilaterals by physical proximity into &ldquo;focus zones&rdquo;. Zones are classified as text/non-text based on connected components.</li>
</ul>
<h3 id="models">Models</h3>
<p><strong>Text Recognition Hybrid</strong>:</p>
<ol>
<li><strong>Normalization &amp; Pattern Matching</strong>: A classic method using the database of 328 models.</li>
<li><strong>Structural Rule Base</strong>: Uses &ldquo;significant&rdquo; quadrilaterals (length $\ge 1/3$ of zone dimension) to verify characters. A rule base defines the expected count of horizontal, vertical, right-diagonal, and left-diagonal lines for each character.</li>
</ol>
<h3 id="evaluation">Evaluation</h3>
<table>
  <thead>
      <tr>
          <th>Metric</th>
          <th>Value</th>
          <th>Baseline</th>
          <th>Notes</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Graphical Element Recognition</td>
          <td>~97%</td>
          <td>N/A</td>
          <td>Evaluated on 20 documents (Fig. 7 examples)</td>
      </tr>
      <tr>
          <td>Text Recognition</td>
          <td>~93%</td>
          <td>N/A</td>
          <td>Evaluated on 20 documents</td>
      </tr>
  </tbody>
</table>
<hr>
<h2 id="paper-information">Paper Information</h2>
<p><strong>Citation</strong>: Ramel, J.-Y., Boissier, G., &amp; Emptoz, H. (1999). Automatic Reading of Handwritten Chemical Formulas from a Structural Representation of the Image. <em>Proceedings of the Fifth International Conference on Document Analysis and Recognition (ICDAR &lsquo;99)</em>, 83-86. <a href="https://doi.org/10.1109/ICDAR.1999.791730">https://doi.org/10.1109/ICDAR.1999.791730</a></p>
<p><strong>Publication</strong>: ICDAR 1999</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bibtex" data-lang="bibtex"><span style="display:flex;"><span><span style="color:#a6e22e">@inproceedings</span>{ramelAutomaticReadingHandwritten1999,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">title</span> = <span style="color:#e6db74">{Automatic Reading of Handwritten Chemical Formulas from a Structural Representation of the Image}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">booktitle</span> = <span style="color:#e6db74">{Proceedings of the {{Fifth International Conference}} on {{Document Analysis}} and {{Recognition}}. {{ICDAR}} &#39;99 ({{Cat}}. {{No}}.{{PR00318}})}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">author</span> = <span style="color:#e6db74">{Ramel, J.-Y. and Boissier, G. and Emptoz, H.}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">year</span> = <span style="color:#ae81ff">1999</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">pages</span> = <span style="color:#e6db74">{83--86}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">publisher</span> = <span style="color:#e6db74">{IEEE}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">address</span> = <span style="color:#e6db74">{Bangalore, India}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">doi</span> = <span style="color:#e6db74">{10.1109/ICDAR.1999.791730}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">isbn</span> = <span style="color:#e6db74">{978-0-7695-0318-9}</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div>]]></content:encoded></item><item><title>Hand-Drawn Chemical Diagram Recognition (AAAI 2007)</title><link>https://hunterheidenreich.com/notes/chemistry/optical-structure-recognition/hand-drawn/ouyang-davis-aaai-2007/</link><pubDate>Sun, 14 Dec 2025 00:00:00 +0000</pubDate><guid>https://hunterheidenreich.com/notes/chemistry/optical-structure-recognition/hand-drawn/ouyang-davis-aaai-2007/</guid><description>A sketch recognition system for organic chemistry that uses domain knowledge (chemical valence) to correct recognition errors.</description><content:encoded><![CDATA[<h2 id="contribution-and-methodological-approach">Contribution and Methodological Approach</h2>
<p>This is a <strong>Method</strong> paper. It proposes a multi-stage pipeline for interpreting hand-drawn diagrams that integrates a trainable symbol recognizer with a domain-specific verification step. The authors validate the method through an ablation study comparing the full system against a baseline lacking domain knowledge.</p>
<h2 id="motivation-for-sketch-based-interfaces">Motivation for Sketch-Based Interfaces</h2>
<p>Current software for specifying chemical structures (e.g., ChemDraw, IsisDraw) relies on mouse and keyboard interfaces, which lack the speed, ease of use, and naturalness of drawing on paper. The goal is to bridge the gap between natural expression and computer interpretation by building a system that understands freehand chemical sketches.</p>
<h2 id="novel-integration-of-chemical-domain-knowledge">Novel Integration of Chemical Domain Knowledge</h2>
<p>The primary novelty is the integration of <strong>domain knowledge</strong> (specifically chemical valence rules) directly into the interpretation loop to resolve ambiguities and correct errors.</p>
<p>Specific technical contributions include:</p>
<ul>
<li><strong>Hybrid Recognizer</strong>: Combines feature-based SVMs, image-based template matching (modified Tanimoto), and off-the-shelf handwriting recognition to handle the mix of geometry and text.</li>
<li><strong>Domain Verification Loop</strong>: A post-processing step that checks the chemical validity of the structure (e.g., nitrogen must have 3 bonds). If an inconsistency is found, the system searches the space of alternative hypotheses generated during the initial parsing phase to find a valid interpretation.</li>
<li><strong>Contextual Parsing</strong>: Uses a sliding window (up to 7 strokes) and spatial context to parse interspersed symbols.</li>
<li><strong>Implicit Structure Handling</strong>: Supports two common chemistry notations: (1) implicit elements, where carbon and hydrogen atoms are omitted and inferred from bond connectivity and valence rules, and (2) aromatic rings, detected as a circle drawn inside a hexagonal 6-carbon cycle.</li>
</ul>
<h2 id="experimental-design-and-user-study">Experimental Design and User Study</h2>
<p>The authors conducted a user study to evaluate the system&rsquo;s robustness on unconstrained sketches.</p>
<ul>
<li><strong>Participants</strong>: 6 users familiar with organic chemistry.</li>
<li><strong>Task</strong>: Each user drew 12 pre-specified molecular compounds on a Tablet PC.</li>
<li><strong>Conditions</strong>: The system was evaluated in two modes:
<ol>
<li><strong>Domain</strong>: The full system with chemical valence checks.</li>
<li><strong>Baseline</strong>: A simplified version with no knowledge of chemical valence/verification.</li>
</ol>
</li>
<li><strong>Data Split</strong>: Evaluated on collected sketches using a leave-one-out style approach (training on 11 examples from the same users).</li>
</ul>
<h2 id="results-and-error-reduction-analysis">Results and Error Reduction Analysis</h2>
<ul>
<li><strong>Performance</strong>: The full system achieved an overall <strong>F-measure of 0.87</strong> (Precision 0.86, Recall 0.89).</li>
<li><strong>Impact of Domain Knowledge</strong>: Using domain knowledge reduced the overall error rate (measured by recall) by <strong>27%</strong> compared to the baseline. The improvement was statistically significant ($p &lt; .05$).</li>
<li><strong>Error Recovery</strong>: The system successfully recovered from interpretations that were geometrically plausible but chemically impossible (e.g., misinterpreting &ldquo;N&rdquo; as bonds), as illustrated in their qualitative analysis.</li>
<li><strong>Output Integration</strong>: Once interpreted, the resulting structure is expressed in a standard chemical specification format that can be passed to tools such as ChemDraw (for rendering) or SciFinder (for database queries).</li>
<li><strong>Limitations</strong>: The system struggled with &ldquo;messy&rdquo; sketches where users drew single bonds with multiple strokes or over-traced lines, as the current bond recognizer assumes single-stroke straight bonds.</li>
</ul>
<hr>
<h2 id="reproducibility-details">Reproducibility Details</h2>
<h3 id="data">Data</h3>
<p>The study collected a custom dataset of hand-drawn diagrams.</p>
<ul>
<li><strong>Volume</strong>: 6 participants $\times$ 12 molecules = 72 total sketches (implied).</li>
<li><strong>Preprocessing</strong>:
<ul>
<li><strong>Scale Normalization</strong>: The system estimates scale based on the average length of straight bonds (chosen because they are easy to identify). This normalizes geometric features for the classifier.</li>
<li><strong>Stroke Segmentation</strong>: Poly-line approximation using recursive splitting (minimizing least squared error) to break multi-segment strokes (e.g., connected bonds) into primitives.</li>
</ul>
</li>
</ul>
<h3 id="algorithms">Algorithms</h3>
<p><strong>1. Ink Parsing (Sliding Window)</strong></p>
<ul>
<li>Examines all combinations of up to <strong>$n=7$</strong> sequential strokes.</li>
<li>Classifies each group as a valid symbol or invalid garbage.</li>
</ul>
<p><strong>2. Template Matching (Image-based)</strong></p>
<ul>
<li>Used for resolving ambiguities in text/symbols (e.g., &lsquo;H&rsquo; vs &lsquo;N&rsquo;).</li>
<li><strong>Metric</strong>: Modified <strong>Tanimoto coefficient</strong>. Unlike standard Tanimoto (point overlap), this version accounts for relative angle and curvature at each point.</li>
</ul>
<p><strong>3. Domain Verification</strong></p>
<ul>
<li><strong>Trigger</strong>: An element with incorrect valence (e.g., Hydrogen with &gt;1 bond).</li>
<li><strong>Resolution</strong>: Searches stored alternative hypotheses for the affected strokes. It accepts a new hypothesis if it resolves the valence error without introducing new ones.</li>
<li><strong>Constraint</strong>: It keeps an inconsistent structure if the original confidence score is significantly higher than alternatives (assuming user is still drawing or intentionally left it incomplete).</li>
</ul>
<h3 id="models">Models</h3>
<p><strong>Symbol Recognizer (Discriminative Classifier)</strong></p>
<ul>
<li><strong>Type</strong>: Support Vector Machine (SVM).</li>
<li><strong>Classes</strong>: Element letters, straight bonds, hash bonds, wedge bonds, invalid groups.</li>
<li><strong>Input Features</strong>:
<ol>
<li>Number of strokes</li>
<li>Bounding-box dimensions (width, height, diagonal)</li>
<li>Ink density (ink length / diagonal length)</li>
<li>Inter-stroke distance (max distance between strokes in group)</li>
<li>Inter-stroke orientation (vector of relative orientations)</li>
</ol>
</li>
</ul>
<p><strong>Text Recognition</strong></p>
<ul>
<li><strong>Microsoft Tablet PC SDK</strong>: Used for recognizing alphanumeric characters (elements and subscripts).</li>
<li>Integrated with the SVM and Template Matcher via a combined scoring mechanism.</li>
</ul>
<h3 id="evaluation">Evaluation</h3>
<table>
  <thead>
      <tr>
          <th>Metric</th>
          <th>Value (Overall)</th>
          <th>Baseline Comparison</th>
          <th>Notes</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>Precision</strong></td>
          <td>0.86</td>
          <td>0.81 (Baseline)</td>
          <td>Full system vs. no domain knowledge</td>
      </tr>
      <tr>
          <td><strong>Recall</strong></td>
          <td>0.89</td>
          <td>0.85 (Baseline)</td>
          <td>27% error reduction</td>
      </tr>
      <tr>
          <td><strong>F-Measure</strong></td>
          <td>0.87</td>
          <td>0.83 (Baseline)</td>
          <td>Statistically significant ($p &lt; .05$)</td>
      </tr>
  </tbody>
</table>
<ul>
<li><strong>True Positive Definition</strong>: Match in both location (stroke grouping) and classification (label).</li>
</ul>
<h3 id="hardware">Hardware</h3>
<ul>
<li><strong>Device</strong>: 1.5GHz Tablet PC.</li>
<li><strong>Performance</strong>: Real-time feedback.</li>
</ul>
<h3 id="reproducibility">Reproducibility</h3>
<p>No source code, trained models, or collected sketch data were publicly released. The paper is openly available through the AAAI digital library. The system depends on the Microsoft Tablet PC SDK (a proprietary, now-discontinued component), which would make exact replication difficult even with the algorithm descriptions provided.</p>
<p><strong>Status</strong>: Closed</p>
<hr>
<h2 id="paper-information">Paper Information</h2>
<p><strong>Citation</strong>: Ouyang, T. Y., &amp; Davis, R. (2007). Recognition of Hand Drawn Chemical Diagrams. <em>Proceedings of the 22nd National Conference on Artificial Intelligence</em> (AAAI-07), 846-851.</p>
<p><strong>Publication</strong>: AAAI 2007</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bibtex" data-lang="bibtex"><span style="display:flex;"><span><span style="color:#a6e22e">@inproceedings</span>{ouyang2007recognition,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">title</span>=<span style="color:#e6db74">{Recognition of Hand Drawn Chemical Diagrams}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">author</span>=<span style="color:#e6db74">{Ouyang, Tom Y and Davis, Randall}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">booktitle</span>=<span style="color:#e6db74">{Proceedings of the 22nd National Conference on Artificial Intelligence}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">volume</span>=<span style="color:#e6db74">{1}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">pages</span>=<span style="color:#e6db74">{846--851}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">year</span>=<span style="color:#e6db74">{2007}</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div>]]></content:encoded></item></channel></rss>