Computational Chemistry
Optical chemical structure recognition example

MolRec: Rule-Based OCSR System

Rule-based system for optical chemical structure recognition using vectorization and geometric analysis, achieving 95% …

Computational Chemistry
Markush structure diagram

SubGrapher: Visual Fingerprinting of Chemical Structures

Novel OCSR method creating molecular fingerprints from images through functional group segmentation for database …

Computational Chemistry
The transformation from a 2D chemical structure image to a SMILES representation

What is Optical Chemical Structure Recognition (OCSR)?

A micro-review of Optical Chemical Structure Recognition (OCSR), covering rule-based systems to modern deep learning …

Computational Chemistry
αExtractor extracts structured chemical information from biomedical literature

αExtractor: Chemical Info from Biomedical Literature

αExtractor uses ResNet-Transformer to extract chemical structures from literature images, including noisy and hand-drawn …

Computational Chemistry
ChemInfty: Chemical Structure Recognition in Patent Images

ChemInfty: Chemical Structure Recognition in Patent Images

Fujiyoshi et al.'s segment-based approach for recognizing chemical structures in challenging Japanese patent images with …

Computational Chemistry

MolNexTR: Dual-Stream Molecular Image Recognition

Dual-stream encoder combining ConvNext and ViT for robust optical chemical structure recognition across diverse …

Computational Chemistry
A colored molecule with annotations, representing the diverse drawing styles found in scientific papers that OCSR models must handle.

MolParser-7M & WildMol: Large-Scale OCSR Datasets

MolParser-7M is the largest OCSR dataset with 7.7M image-text pairs of molecules and E-SMILES, including 400k real-world …

Computational Chemistry
Optical chemical structure recognition example

MolParser: End-to-End Molecular Structure Recognition

MolParser converts molecular images from scientific documents to machine-readable formats using end-to-end learning with …

Computational Chemistry
ZINC-22 Tranche Browser showing molecular count distribution

ZINC-22: Multi-Billion Scale Database

ZINC-22 dataset provides 37+ billion make-on-demand molecules for virtual screening and modern drug discovery.

Computational Chemistry
SELFIES strings guarantee 100% valid molecules - even when generated randomly

Converting SELFIES Strings to 2D Molecular Images

Visualize SELFIES molecular representations and test their 100% robustness through random sampling experiments.

Computational Chemistry
Aspirin molecular structure generated from SMILES string

Converting SMILES Strings to 2D Molecular Images

Learn how to create 2D molecular images from SMILES strings using RDKit and PIL, with proper formatting and legends.

Computational Chemistry
SELFIES representation of 2-Fluoroethenimine molecule

SELFIES (Self-Referencing Embedded Strings)

SELFIES is a 100% robust molecular string representation for ML, implemented in the open-source selfies Python library.