Markush structures are generic chemical representations used in patents to claim families of related compounds. They present unique challenges for OCSR: variable substituent groups (R-groups), combinatorial enumeration, and mixed text-diagram layouts that standard molecule recognizers are not designed to handle. This group covers detection (identifying which images in a patent contain Markush diagrams) and full parsing (extracting the combinatorial structure into machine-readable form).
| Paper | Year | Task | Key Idea |
|---|---|---|---|
| Detecting Markush Structures in Low SNR Images | 2023 | Detection | Patch-based CNN classifies whether a chemical image contains Markush structures |
| MarkushGrapher | 2025 | Parsing | Multi-modal transformer jointly encodes vision, text, and layout to parse Markush structures into CXSMILES |
| MarkushGrapher-2 | 2026 | Parsing | Dual-encoder architecture with dedicated ChemicalOCR for end-to-end Markush recognition, plus IP5-M benchmark |


