Paper Information

Citation: Chang, Q., Chen, M., Pi, C., Hu, P., Zhang, Z., Ma, J., Du, J., Yin, B., & Hu, J. (2025). RFL: Simplifying Chemical Structure Recognition with Ring-Free Language. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2025). https://doi.org/10.48550/arXiv.2412.07594

Publication: AAAI 2025 (Oral)

Additional Resources:

Methodological Contribution

This is a Methodological paper ($\Psi_{\text{Method}}$). It introduces a novel representation system (Ring-Free Language) and a specialized neural architecture (Molecular Skeleton Decoder) designed to solve specific limitations in converting 2D images to 1D chemical strings. The paper validates this method through direct comparison with existing baselines and ablation studies.

Motivation: Limitations of 1D Serialization

Current Optical Chemical Structure Recognition (OCSR) methods typically rely on “unstructured modeling,” where 2D molecular graphs are flattened into 1D strings like SMILES or SSML. While simple, these linear formats struggle to explicitly capture complex spatial relationships, particularly in molecules with multiple rings and branches. End-to-end models often fail to “understand” the graph structure when forced to predict these implicit 1D sequences, leading to error accumulation in complex scenarios.

Innovation: Ring-Free Language (RFL) and Molecular Skeleton Decoder (MSD)

The authors propose two primary contributions to decouple spatial complexity:

  1. Ring-Free Language (RFL): A divide-and-conquer representation that splits a molecular graph $G$ into three explicit components: a molecular skeleton $\mathcal{S}$, individual ring structures $\mathcal{R}$, and branch information $\mathcal{F}$. This allows rings to be collapsed into “SuperAtoms” or “SuperBonds” during initial parsing.
  2. Molecular Skeleton Decoder (MSD): A hierarchical architecture that progressively predicts the skeleton first, then the individual rings (using SuperAtom features as conditions), and finally classifies the branch connections.

Methodology and Experiments

The method was evaluated on both handwritten and printed chemical structures against two baselines: DenseWAP (Zhang et al. 2018) and RCGD (Hu et al. 2023).

  • Datasets:
    • EDU-CHEMC: ~49k handwritten samples (challenging, diverse styles)
    • Mini-CASIA-CSDB: ~89k printed samples (from ChEMBL)
    • Synthetic Complexity Dataset: A custom split of ChEMBL data grouped by structural complexity (atoms + bonds + rings) to test generalization
  • Ablation Studies (Table 2, on EDU-CHEMC with MSD-DenseWAP): Without MSD or [conn], EM=38.70%. Adding [conn] alone raised EM to 44.02%. Adding MSD alone raised EM to 52.76%. Both together achieved EM=64.96%, confirming each component’s contribution.

Outcomes and Conclusions

  • New best results: MSD-RCGD achieved 65.39% EM on EDU-CHEMC (handwritten) and 95.23% EM on Mini-CASIA-CSDB (printed), outperforming the RCGD baseline (62.86% and 95.01%, respectively). MSD-DenseWAP surpassed the previous best on EDU-CHEMC by 2.06% EM (64.92% vs. 62.86%).
  • Universal improvement: Applying MSD/RFL to DenseWAP improved its accuracy from 61.35% to 64.92% EM on EDU-CHEMC and from 92.09% to 94.10% EM on Mini-CASIA-CSDB, demonstrating the method is model-agnostic.
  • Complexity handling: When trained on low-complexity molecules only (levels 1-2), MSD-DenseWAP still recognized higher-complexity unseen structures, while standard DenseWAP could hardly recognize them at all (Figure 6 in the paper).

The authors note that this is the first end-to-end solution that decouples and models chemical structures in a structured form. Future work aims to extend structured-based modeling to other tasks such as tables, flowcharts, and diagrams.


Artifacts

ArtifactTypeLicenseNotes
RFL-MSDCodeMITOfficial PyTorch implementation

Reproducibility Details

Data

The authors utilized one handwritten and one printed dataset, plus a synthetic set for stress-testing complexity.

PurposeDatasetSizeNotes
Training/TestEDU-CHEMC48,998 Train / 2,992 TestHandwritten images from educational scenarios
Training/TestMini-CASIA-CSDB89,023 Train / 8,287 TestPrinted images rendered from ChEMBL using RDKit
GeneralizationChEMBL Subset5 levels of complexityCustom split based on Eq: $N_{atom} + N_{bond} + 12 \times N_{ring}$

Algorithms

RFL Splitting (Encoding):

  1. Detect Rings: Use DFS to find all non-nested rings $\mathcal{R}$.
  2. Determine Adjacency ($\gamma$): Calculate shared edges between rings.
  3. Merge:
    • If $\gamma(r_i) = 0$ (isolated), merge ring into a SuperAtom node.
    • If $\gamma(r_i) > 0$ (adjacent), merge ring into a SuperBond edge.
  4. Update: Record connection info in $\mathcal{F}$ and remove ring details from the main graph to form Skeleton $\mathcal{S}$.

MSD Decoding:

  • Hierarchical Prediction: The model predicts the Skeleton $\mathcal{S}$ first.
  • Contextual Ring Prediction: When a SuperAtom/Bond token is predicted, its hidden state $f^s$ is stored. After the skeleton is finished, $f^s$ is used as a condition to autoregressively decode the specific ring structure.
  • Token [conn]: A special token separates connected ring bonds from unconnected ones to sparsify the branch classification task.

Models

The architecture follows a standard Image-to-Sequence pattern but with a forked decoder.

  • Encoder: DenseNet (Growth rate=24, Depth=32 per block)
  • Decoder (MSD):
    • Core: GRU with Attention (Hidden dim=256, Embedding dim=256, Dropout=0.15)
    • Skeleton Module: Autoregressively predicts sequence tokens. Uses Maxout activation.
    • Branch Module: A binary classifier (MLP) taking concatenated features of skeleton bonds $f_{bs}$ and ring bonds $f_{br}$ to predict connectivity matrix $\mathcal{F}$.
  • Loss Function: $\mathcal{O} = \lambda_1 \mathcal{L}_{ce} + \lambda_2 \mathcal{L}_{cls}$ (where $\lambda_1 = \lambda_2 = 1$)

Evaluation

Metrics focus on exact image reconstruction and structural validity.

MetricDescriptionNotes
EM (Exact Match)% of images where predicted graph exactly matches ground truth.Primary metric
Struct-EM% of correctly identified chemical structures (ignoring non-chemical text).Auxiliary metric

Hardware

  • Compute: 4 x NVIDIA Tesla V100 (32GB RAM)
  • Training Configuration:
    • Batch size: 8 (Handwritten), 32 (Printed)
    • Epochs: 50
    • Optimizer: Adam ($lr=2\times10^{-4}$, decayed by 0.5 via MultiStepLR)

Citation

@inproceedings{changRFLSimplifyingChemical2025,
  title = {RFL: Simplifying Chemical Structure Recognition with Ring-Free Language},
  shorttitle = {RFL},
  author = {Chang, Qikai and Chen, Mingjun and Pi, Changpeng and Hu, Pengfei and Zhang, Zhenrong and Ma, Jiefeng and Du, Jun and Yin, Baocai and Hu, Jinshui},
  year = {2025},
  booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence},
  eprint = {2412.07594},
  primaryclass = {cs},
  doi = {10.48550/arXiv.2412.07594},
  archiveprefix = {arXiv}
}