Markush structures are generic chemical representations used in patents to claim families of related compounds. They present unique challenges for OCSR: variable substituent groups (R-groups), combinatorial enumeration, and mixed text-diagram layouts that standard molecule recognizers are not designed to handle. This group covers detection (identifying which images in a patent contain Markush diagrams) and full parsing (extracting the combinatorial structure into machine-readable form).

MarkushGrapher: Multi-modal Markush Structure Recognition
This paper introduces a multi-modal approach for extracting chemical Markush structures from patents, combining a Vision-Text-Layout encoder with a specialized chemical vision encoder. It addresses the lack of training data with a synthetic generation pipeline and introduces M2S, a new real-world benchmark.
