Markush Structures on Hunter Heidenreich | ML Research Scientist

MarkushGrapher-2: End-to-End Markush Recognition

Mon, 06 Apr 2026 00:00:00 +0000

A Multimodal Method for Markush Structure Recognition

This is a Method paper that introduces MarkushGrapher-2, a universal encoder-decoder model for recognizing both standard molecular structures and multimodal Markush structures from chemical images. The primary contribution is a dual-encoder architecture that fuses a pretrained OCSR (Optical Chemical Structure Recognition) vision encoder with a Vision-Text-Layout (VTL) encoder, connected through a dedicated ChemicalOCR module for end-to-end processing. The paper also introduces two new resources: a large-scale training dataset (USPTO-MOL-M) of real-world Markush structures extracted from USPTO patent MOL files, and IP5-M, a manually annotated benchmark of 1,000 Markush structures from five major patent offices.

Why Markush Structure Recognition Remains Challenging

Markush structures are compact representations used in patent documents to describe families of related molecules. They combine a visual backbone (atoms, bonds, variable regions) with textual definitions of substituents that can replace those variable regions. This multimodal nature makes them harder to parse than standard molecular diagrams.

Three factors limit automatic Markush recognition. First, visual styles vary across patent offices and publication years. Second, textual definitions lack standardization and often contain conditional or recursive descriptions. Third, real-world training data with comprehensive annotations is scarce. As a result, Markush structures are currently indexed only in two proprietary, manually curated databases: MARPAT and DWPIM.

Prior work, including the original MarkushGrapher, required pre-annotated OCR outputs at inference time, limiting practical deployment. General-purpose models like GPT-5 and DeepSeek-OCR produce mostly chemically invalid outputs on Markush images, suggesting these lie outside their training distribution.

Dual-Encoder Architecture with Dedicated ChemicalOCR

MarkushGrapher-2 uses two complementary encoding pipelines:

Vision encoder pipeline: The input image passes through a Swin-B Vision Transformer (taken from MolScribe) pretrained for OCSR. This encoder extracts visual features representing molecular structures and remains frozen during training.
Vision-Text-Layout (VTL) pipeline: The same image goes through ChemicalOCR, a compact 256M-parameter vision-language model fine-tuned from SmolDocling for OCR on chemical images. ChemicalOCR extracts character-level text and bounding boxes. These, combined with image patches, feed into a T5-base VTL encoder following the UDOP fusion paradigm, where visual and textual tokens are spatially aligned by bounding box overlap.

The VTL encoder output is concatenated with projected embeddings from the vision encoder. This joint representation feeds a text decoder that auto-regressively generates a CXSMILES (ChemAxon Extended SMILES) string describing the backbone structure and a substituent table listing variable group definitions.

Two-Stage Training Strategy

Training proceeds in two phases:

Phase 1 (Adaptation): The vision encoder is frozen. The MLP projector and text decoder train on 243K real-world image-SMILES pairs from MolScribe’s USPTO dataset (3 epochs). This aligns the decoder to the pretrained OCSR feature space.
Phase 2 (Fusion): The vision encoder, projector, and ChemicalOCR are all frozen. The VTL encoder and text decoder train on a mix of 235K synthetic and 145K real-world Markush samples (2 epochs). The VTL encoder learns the features needed for CXSMILES and substituent table prediction without disrupting the established OCSR representations.

The total model has 831M parameters, of which 744M are trainable.

Datasets and Evaluation Benchmarks

Training Data

Purpose	Dataset	Size	Source
OCR pretraining	Synthetic chemical structures	235K	PubChem SMILES augmented to CXSMILES, rendered with annotations
OCR fine-tuning	Manual OCR annotations	7K	IP5 patent document crops
Phase 1 (OCSR)	MolScribe USPTO	243K	Real image-SMILES pairs
Phase 2 (MMSR)	Synthetic CXSMILES	235K	Same as OCR pretraining set
Phase 2 (MMSR)	MolParser dataset	91K	Real-world Markush, converted to CXSMILES
Phase 2 (MMSR)	USPTO-MOL-M	54K	Real-world, auto-extracted from USPTO MOL files (2010-2025)

Evaluation Benchmarks

Markush benchmarks: M2S (103 samples), USPTO-M (74), WildMol-M (10K, semi-manual), and the new IP5-M (1,000 manually annotated from USPTO, JPO, KIPO, CNIPA, and EPO patents, 1980-2025).

OCSR benchmarks: USPTO (5,719), JPO (450), UOB (5,740), WildMol (10K).

The primary metric is CXSMILES Accuracy (A): a prediction is correct when (1) the predicted SMILES matches the ground truth by InChIKey equivalence, and (2) all Markush features (variable groups, positional and frequency variation indicators) are correctly represented. Stereochemistry is ignored during evaluation.

Results: Markush Structure Recognition

Model	M2S	USPTO-M	WildMol-M	IP5-M
MolParser-Base	39	30	38.1	47.7
MolScribe	21	7	28.1	22.3
GPT-5	3	0	-	-
DeepSeek-OCR	0	0	1.9	0.0
MarkushGrapher-1	38	10	32	-
MarkushGrapher-2	56	13	55	48.0

On M2S, MarkushGrapher-2 achieves 56% CXSMILES accuracy vs. 38% for MarkushGrapher-1, a relative improvement of 47%. On WildMol-M (the largest benchmark at 10K samples), MarkushGrapher-2 reaches 55% vs. 38.1% for MolParser-Base and 32% for MarkushGrapher-1. GPT-5 and DeepSeek-OCR generate mostly chemically invalid outputs on Markush images: only 30% and 15% of their predictions are valid CXSMILES on M2S, respectively.

Results: Standard Molecular Structure Recognition

Model	WildMol	JPO	UOB	USPTO
MolParser-Base	76.9	78.9	91.8	93.0
MolScribe	66.4	76.2	87.4	93.1
DECIMER 2.7	56.0	64.0	88.3	59.9
MolGrapher	45.5	67.5	94.9	91.5
DeepSeek-OCR	25.8	31.6	78.7	36.9
MarkushGrapher-2	68.4	71.0	96.6	89.8

MarkushGrapher-2 achieves the highest score on UOB (96.6%) and remains competitive on other OCSR benchmarks, despite being primarily optimized for Markush recognition.

ChemicalOCR vs. General OCR

Model	M2S F1	USPTO-M F1	IP5-M F1
PaddleOCR v5	7.7	1.2	1.9
EasyOCR	10.2	18.0	18.4
ChemicalOCR	87.2	93.0	86.5

General-purpose OCR tools fail on chemical images because they misinterpret bonds as characters and cannot parse chemical abbreviations. ChemicalOCR outperforms both by a large margin.

Ablation Results and Key Findings

OCR input is critical for Markush features. Without OCR, CXSMILES accuracy drops from 56% to 4% on M2S, and from 53.7% to 15.4% on IP5-M. The backbone structure accuracy ($A_{\text{InChIKey}}$) also drops substantially (from 80% to 39% on M2S), though the vision encoder alone can still recover some structural information. This confirms that textual cues (brackets, indices, variable definitions) are essential for Markush feature prediction.

Two-phase training improves both tasks. Compared to single-phase (fusion only) training, the two-phase strategy improves CXSMILES accuracy from 44% to 50% on M2S and from 53.0% to 61.5% on JPO after the same number of epochs. Adapting the decoder to OCSR features before introducing the VTL encoder prevents the fusion process from degrading learned visual representations.

Frequency variation indicators remain the hardest feature. On IP5-M, the per-feature breakdown shows 73.3% accuracy for backbone InChI, 74.8% for variable groups, 78.8% for positional variation, but only 30.7% for frequency variation (Sg groups). These repeating structural units are particularly challenging to represent and predict.

Limitations: The model relies on accurate OCR as a prerequisite. Performance on USPTO-M (13% CXSMILES accuracy) lags behind other benchmarks, likely due to the older patent styles in that dataset. The paper does not report inference latency.

Reproducibility Details

Data

Purpose	Dataset	Size	Notes
OCR pretraining	Synthetic chemical images	235K	Generated from PubChem SMILES, augmented to CXSMILES
OCR fine-tuning	IP5 patent crops	7K	Manually annotated
Phase 1 training	MolScribe USPTO	243K	Public, real image-SMILES pairs
Phase 2 training	Synthetic + MolParser + USPTO-MOL-M	380K	Mix of synthetic (235K), MolParser (91K), USPTO-MOL-M (54K)
Evaluation	M2S, USPTO-M, WildMol-M, IP5-M	103 to 10K	Markush benchmarks
Evaluation	WildMol, JPO, UOB, USPTO	450 to 10K	OCSR benchmarks

Models

Component	Architecture	Parameters	Status
Vision encoder	Swin-B ViT (from MolScribe)	~87M	Frozen
VTL encoder + decoder	T5-base	~744M trainable	Trained
ChemicalOCR	SmolDocling-based VLM	256M	Fine-tuned, frozen in Phase 2
MLP projector	Linear projection	-	Trained in Phase 1, frozen in Phase 2
Total		831M

Evaluation

Metric	Definition
CXSMILES Accuracy (A)	Percentage of samples where InChIKey matches AND all Markush features correct
$A_{\text{InChIKey}}$	Backbone structure accuracy only (ignoring Markush features)
Table Accuracy	Percentage of correctly predicted substituent tables
Markush Accuracy	Joint CXSMILES + Table accuracy
OCR F1	Bounding-box-level precision/recall at IoU > 0.5

Hardware

Training: NVIDIA A100 GPU
Phase 1: 3 epochs, Adam optimizer, lr 5e-4, 1000 warmup steps, batch size 10, weight decay 1e-3
Phase 2: 2 epochs, batch size 8

Artifacts

Artifact	Type	License	Notes
MarkushGrapher GitHub	Code	MIT	Official implementation of MarkushGrapher-2 with models and datasets

Reproducibility classification: Highly Reproducible. Code, models, and datasets are all publicly released under an MIT license with documented training hyperparameters and a single A100 GPU requirement.

Paper Information

Citation: Strohmeyer, T., Morin, L., Meijer, G. I., Weber, V., Nassar, A., & Staar, P. (2026). MarkushGrapher-2: End-to-end Multimodal Recognition of Chemical Structures. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

Publication: CVPR 2026

Additional Resources:

@misc{strohmeyer2026markushgrapher,
  title={MarkushGrapher-2: End-to-end Multimodal Recognition of Chemical Structures},
  author={Strohmeyer, Tim and Morin, Lucas and Meijer, Gerhard Ingmar and Weber, Val\'{e}ry and Nassar, Ahmed and Staar, Peter},
  year={2026},
  eprint={2603.28550},
  archiveprefix={arXiv},
  primaryclass={cs.CV}
}

MarkushGrapher: Multi-modal Markush Structure Recognition

Fri, 19 Dec 2025 00:00:00 +0000

Overcoming Unimodal Limitations for Markush Structures

The automated analysis of chemical literature, particularly patents, is critical for drug discovery and material science. A major bottleneck is the extraction of Markush structures, which are complex chemical templates that represent families of molecules using a core backbone image and textual variable definitions. Existing methods are limited because they either rely solely on images (OCSR) and miss the textual context, or focus solely on text and miss the structural backbone. This creates a practical need for a unified, multi-modal approach that jointly interprets visual and textual data to accurately extract these structures for prior-art search and database construction. This paper proposes a Method and introduces a new Resource (M2S dataset) to bridge this gap.

The core innovation is MarkushGrapher, a multi-modal architecture that jointly encodes image, text, and layout information. Key contributions include:

Dual-Encoder Architecture: Combines a Vision-Text-Layout (VTL) encoder (based on UDOP) with a specialized, pre-trained Optical Chemical Structure Recognition (OCSR) encoder (MolScribe). Let $E_{\text{VTL}}$ represent the combined sequence embedding and $E_{\text{OCSR}}$ represent the domain-specific visual embeddings.
Joint Recognition: The model autoregressively generates a sequential graph representation (Optimized CXSMILES) and a substituent table simultaneously. It uses cross-modal dependencies, allowing text to clarify ambiguous visual details like bond types.
Synthetic Data Pipeline: A comprehensive pipeline generates realistic synthetic Markush structures (images and text) from PubChem data, overcoming the lack of labeled training data.
Optimized Representation: A compacted version of CXSMILES moves variable groups into the SMILES string and adds explicit atom indexing to handle complex “frequency” and “position” variation indicators.

Experimental Validation on the New M2S Benchmark

The authors validated their approach using the following setup:

Baselines: Compared against image-only chemistry models (DECIMER, MolScribe) and general-purpose multi-modal models (Uni-SMART, GPT-4o, Pixtral, Llama-3.2).
Datasets: Evaluated on three benchmarks:
1. MarkushGrapher-Synthetic: 1,000 generated samples.
2. M2S: A new benchmark of 103 manually annotated real-world patent images.
3. USPTO-Markush: 74 Markush backbone images from USPTO patents.
Ablation Studies: Analyzed the impact of the OCSR encoder, late fusion strategies, and the optimized CXSMILES format. Late fusion improved USPTO-Markush EM from 23% (VTL only) to 32% (Table 3). Removing R-group compression dropped M2S EM from 38% to 30%, and removing atom indexing dropped USPTO-Markush EM from 32% to 24% (Table 4).

Key Results

Performance: MarkushGrapher outperformed all baselines. On the M2S benchmark, it achieved 38% Exact Match on CXSMILES (compared to 21% for MolScribe) and 29% Exact Match on tables. On USPTO-Markush, it reached 32% CXSMILES EM versus 7% for MolScribe.
Markush Feature Recognition: The model can recognize complex Markush features like frequency variation (‘Sg’) and position variation (’m’) indicators. DECIMER and MolScribe scored 0% on both ’m’ and ‘Sg’ sections (Table 2), while MarkushGrapher achieved 76% on ’m’ and 31% on ‘Sg’ sections on M2S.
Cross-Modal Reasoning: Qualitative analysis showed the model can correctly infer visual details (such as bond order) that appear ambiguous in the image but become apparent with the text description.
Robustness: The model generalizes well to real-world data despite being trained purely on synthetic data. On augmented versions of M2S and USPTO-Markush simulating low-quality scanned documents, it maintained 31% and 32% CXSMILES EM respectively (Table 6).

Limitations

The authors note several limitations:

MarkushGrapher does not currently handle abbreviations in chemical structures (e.g., ‘OG’ for oxygen connected to a variable group).
The model relies on ground-truth OCR cells as input, requiring an external OCR model for practical deployment.
Substituent definitions that combine text with interleaved chemical structure drawings are not supported.
The model is trained to predict ’m’ sections connecting to all atoms in a cycle, which can technically violate valence constraints, though the output contains enough information to reconstruct only valid connections.

Reproducibility Details

Data

Training Data

Source: Synthetic dataset generated from PubChem SMILES.
Size: 210,000 synthetic images.
Pipeline:
1. Selection: Sampled SMILES from PubChem based on substructure diversity.
2. Augmentation: SMILES augmented to artificial CXSMILES using RDKit (inserting variable groups, frequency indicators).
3. Rendering: Images rendered using Chemistry Development Kit (CDK) with randomized drawing parameters (font, bond width, spacing).
4. Text Generation: Textual definitions generated using manual templates extracted from patents; 10% were paraphrased using Mistral-7B-Instruct-v0.3 to increase diversity.
5. OCR: Bounding boxes extracted via a custom SVG parser aligned with MOL files.

Evaluation Data

M2S Dataset: 103 images from USPTO, EPO, and WIPO patents (1999-2023), manually annotated with CXSMILES and substituent tables.
USPTO-Markush: 74 images from USPTO patents (2010-2016).
MarkushGrapher-Synthetic: 1,000 samples generated via the pipeline.

Algorithms

Optimized CXSMILES:
- Compression: Variable groups moved from the extension block to the main SMILES string as special atoms to reduce sequence length.
- Indexing: Atom indices appended to each atom (e.g., C:1) to explicitly link the graph to the extension block (crucial for m and Sg sections).
- Vocabulary: Specific tokens used for atoms and bonds.
Augmentation: Standard image augmentations (shift, scale, blur, pepper noise, random lines) and OCR text augmentations (character substitution/insertion/deletion).

Models

Architecture: Encoder-Decoder Transformer.
- VTL Encoder: T5-large encoder (initialized from UDOP) that processes image patches, text tokens, and layout (bounding boxes).
- OCSR Encoder: Vision encoder from MolScribe (Swin Transformer), frozen during training.
- Text Decoder: T5-large decoder.
Fusion Strategy: Late Fusion. The core multi-modal alignment combines the textual layout features with specialized chemical vision explicitly. The fused representation relies on the VTL output $e_1$ concatenated with the MLP-projected OCSR output $e_2$ before decoding: $$ e = e_1(v, t, l) \oplus \text{MLP}(e_2(v)) $$
Parameters: 831M total (744M trainable).

Evaluation

Metrics:

CXSMILES Exact Match (EM): Requires perfect match of SMILES string, variable groups, m sections, and Sg sections (ignoring stereochemistry).
Tanimoto Score: Similarity of RDKit DayLight fingerprints (Markush features removed).
Table Exact Match: All variable groups and substituents must match.
Table F1-Score: Aggregated recall and precision of substituents per variable group.

Hardware

Compute: Trained on a single NVIDIA H100 GPU.
Training Config: 10 epochs, batch size of 10, ADAM optimizer, learning rate 5e-4, 100 warmup steps, weight decay 1e-3.

Artifacts

Artifact	Type	License	Notes
MarkushGrapher	Code	MIT	Official implementation

Paper Information

Citation: Morin, L., Weber, V., Nassar, A., Meijer, G. I., Van Gool, L., Li, Y., & Staar, P. (2025). MarkushGrapher: Joint Visual and Textual Recognition of Markush Structures. 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 14505-14515. https://doi.org/10.1109/CVPR52734.2025.01352

Publication: CVPR 2025

Additional Resources:

GitHub Repository

@inproceedings{morinMarkushGrapherJointVisual2025,
  title = {MarkushGrapher: Joint Visual and Textual Recognition of Markush Structures},
  shorttitle = {MarkushGrapher},
  booktitle = {2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  author = {Morin, Lucas and Weber, Valéry and Nassar, Ahmed and Meijer, Gerhard Ingmar and Van Gool, Luc and Li, Yawei and Staar, Peter},
  year = {2025},
  month = jun,
  pages = {14505--14515},
  doi = {10.1109/CVPR52734.2025.01352}
}

One Strike, You're Out: Detecting Markush Structures

Thu, 18 Dec 2025 00:00:00 +0000

Methodology and Classification

This is a Method paper (Classification: $\Psi_{\text{Method}}$).

It proposes a patch-based classification pipeline to solve a technical failure mode in Optical Chemical Structure Recognition (OCSR). Distinct rhetorical indicators include a baseline comparison (CNN vs. traditional ORB), ablation studies (architecture, pretraining), and a focus on evaluating the filtering efficacy against a known failure mode.

The Markush Structure Challenge

The Problem: Optical Chemical Structure Recognition (OCSR) tools convert 2D images of molecules into machine-readable formats. These tools struggle with “Markush structures,” generic structural templates used frequently in patents that contain variables rather than specific atoms (e.g., $R$, $X$, $Y$).

The Gap: Markush structures are difficult to detect because they often appear as small indicators (a single “R” or variable) within a large image, resulting in a very low Signal-to-Noise Ratio (SNR). Existing OCSR research pipelines typically bypass this by manually excluding these structures from their datasets.

The Goal: To build an automated filter that can identify images containing Markush structures so they can be removed from OCSR pipelines, improving overall database quality without requiring manual data curation.

Patch-Based Classification Pipeline

The core technical contribution is an end-to-end deep learning pipeline tailored for low-SNR chemical images where standard global resizing or cropping fails due to large variations in image resolution and pixel scales.

Patch Generation: The system slices input images into overlapping patches generated from two offset grids, ensuring that variables falling on boundaries are fully captured in at least one crop.
Targeted Annotation: The labels rely on pixel-level bounding boxes around Markush indicators, minimizing the noise that would otherwise overwhelm a full-image classification attempt.
Inference Strategy: During inference, the query image is broken into patches, individually classified, and aggregated entirely using a maximum pooling rule where $X = \max_{i=1}^{n} \{ x_i \}$.
Evaluation: Provides the first systematic comparison between fixed-feature extraction (ORB + XGBoost) and end-to-end deep learning for this specific domain.

Experimental Setup and Baselines

The authors compared two distinct paradigms on a manually annotated dataset:

Fixed-Feature Baseline: Used ORB (Oriented FAST and Rotated BRIEF) to detect keypoints and match them against a template bank of known Markush symbols. Features (match counts, Hamming distances) were fed into an XGBoost model.
Deep Learning Method: Fine-tuned ResNet18 and Inception V3 models on the generated image patches.
- Ablations: Contrasted pretraining sources, evaluating general domain (ImageNet) against chemistry-specific domain (USPTO images).
- Fine-tuning: Compared full-network fine-tuning against freezing all but the fully connected layers.

To handle significant class imbalance, the primary evaluation metric was the Macro F1 score, defined as:

$$ \text{Macro F1} = \frac{1}{N} \sum_{i=1}^{N} \frac{2 \cdot \text{precision}_i \cdot \text{recall}_i}{\text{precision}_i + \text{recall}_i} $$

Performance Outcomes

CNN vs. ORB: Deep learning architectures outperformed the fixed-feature baseline. The best model (Inception V3 pretrained on ImageNet) achieved an image-level Macro F1 of 0.928, compared to 0.701 (image-level) for the ORB baseline, and a patch-level Macro F1 of 0.917.
The Pretraining Surprise: Counterintuitively, ImageNet pretraining consistently outperformed the domain-specific USPTO pretraining. The authors hypothesize that the filters learned from ImageNet pretraining generalize well outside the ImageNet domain, though why the USPTO-pretrained filters underperform remains unclear.
Full Model Tuning: Unfreezing the entire network yielded higher performance than tuning only the classifier head, indicating that standard low-level visual filters require substantial adaptation to reliably distinguish chemical line drawings.
Limitations and Edge Cases: The best CNN achieved an ROC AUC of 0.97 on the primary patch test set, while the ORB baseline scored 0.81 on the auxiliary dataset (the paper notes these ROC curves are not directly comparable due to different evaluation sets). The aggregation metric ($X = \max \{ x_i \}$) is naive and has not been optimized. Furthermore, the patching approach creates inherent label noise when a Markush indicator is cleanly bisected by a patch edge, potentially forcing the network to learn incomplete visual features.

Reproducibility Details

Data

The study used a primary dataset labeled by domain experts and a larger auxiliary dataset for evaluation.

Purpose	Dataset	Size	Notes
Training/Val	Primary Dataset	272 Images	Manually annotated with bounding boxes for Markush indicators. Split 60/20/20.
Evaluation	Auxiliary Dataset	~5.4k Images	5117 complete structures, 317 Markush. Used for image-level testing only (no bbox).

Patch Generation:

Images are cropped into patches of size 224x224 (ResNet) or 299x299 (Inception).
Patches are generated from 2 grids offset by half the patch width/height to ensure annotations aren’t lost on edges.
Labeling Rule: A patch is labeled “Markush” if >50% of an annotation’s pixels fall inside it.

Algorithms

ORB (Baseline):

Matches query images against a bank of template patches containing Markush indicators.
Features: Number of keypoints, number of matches, Hamming distance of best 5 matches.
Classifier: XGBoost trained on these features.
Hyperparameters: Search over number of features (500-2000) and template patches (50-250).

Training Configuration:

Framework: PyTorch with Optuna for optimization.
Optimization: 25 trials per configuration.
Augmentations: Random perspective shift, posterization, sharpness/blur.

Models

Two main architectures were compared.

Model	Input Size	Parameters	Pretraining Source
ResNet18	224x224	11.5M	ImageNet
Inception V3	299x299	23.8M	ImageNet & USPTO

Best Configuration: Inception V3, ImageNet weights, Full Model fine-tuning (all layers unfrozen).

Evaluation

Primary metric was Macro F1 due to class imbalance.

Metric	Best CNN (Inception V3)	Baseline (ORB)	Notes
Patch Test F1	$0.917 \pm 0.014$	N/A	ORB does not support patch-level
Image Test F1	$0.928 \pm 0.035$	$0.701 \pm 0.052$	CNN aggregates patch predictions
Aux Test F1	0.914	0.533	Evaluation on large secondary dataset
ROC AUC	0.97	0.81

Hardware

GPU: Tesla V100-SXM2-16GB
CPU: Intel Xeon E5-2686 @ 2.30GHz
RAM: 64 GB

Artifacts

Artifact	Type	License	Notes
GitHub Repository	Code	Apache-2.0	MSc thesis code: CNN training, ORB baseline, evaluation scripts

The primary dataset was manually annotated by Elsevier domain experts and is not publicly available. The auxiliary dataset (from Elsevier) is also not public. Pre-trained model weights are not released in the repository.

Paper Information

Citation: Jurriaans, T., Szarkowska, K., Nalisnick, E., Schwörer, M., Thorne, C., & Akhondi, S. (2023). One Strike, You’re Out: Detecting Markush Structures in Low Signal-to-Noise Ratio Images. arXiv preprint arXiv:2311.14633. https://doi.org/10.48550/arXiv.2311.14633

Publication: arXiv 2023

Additional Resources:

GitHub Repository

@misc{jurriaansOneStrikeYoure2023,
  title = {One {{Strike}}, {{You}}'re {{Out}}: {{Detecting Markush Structures}} in {{Low Signal-to-Noise Ratio Images}}},
  shorttitle = {One {{Strike}}, {{You}}'re {{Out}}},
  author = {Jurriaans, Thomas and Szarkowska, Kinga and Nalisnick, Eric and Schwoerer, Markus and Thorne, Camilo and Akhondi, Saber},
  year = 2023,
  month = nov,
  number = {arXiv:2311.14633},
  eprint = {2311.14633},
  primaryclass = {cs},
  publisher = {arXiv},
  doi = {10.48550/arXiv.2311.14633},
  archiveprefix = {arXiv}
}

Markush Structures on Hunter Heidenreich | ML Research Scientist

MarkushGrapher-2: End-to-End Markush Recognition

A Multimodal Method for Markush Structure Recognition

Why Markush Structure Recognition Remains Challenging

Dual-Encoder Architecture with Dedicated ChemicalOCR

Two-Stage Training Strategy

Datasets and Evaluation Benchmarks

Training Data

Evaluation Benchmarks

Results: Markush Structure Recognition

Results: Standard Molecular Structure Recognition

ChemicalOCR vs. General OCR

Ablation Results and Key Findings

Reproducibility Details

Data

Models

Evaluation

Hardware

Artifacts

Paper Information

MarkushGrapher: Multi-modal Markush Structure Recognition

Overcoming Unimodal Limitations for Markush Structures

MarkushGrapher: The Multi-Modal Architecture

Experimental Validation on the New M2S Benchmark

Key Results

Limitations

Reproducibility Details

Data

Algorithms

Models

Evaluation

Hardware

Artifacts

Paper Information

One Strike, You're Out: Detecting Markush Structures

Methodology and Classification

The Markush Structure Challenge

Patch-Based Classification Pipeline

Experimental Setup and Baselines

Performance Outcomes

Reproducibility Details

Data

Algorithms

Models

Evaluation

Hardware

Artifacts

Paper Information