The most recent wave of OCSR methods builds on large pretrained vision-language models, using their broad visual understanding to generalize across diverse chemical diagram styles and notation conventions. GTR-CoT introduces graph-traversal chain-of-thought reasoning to guide recognition. MolParser uses an end-to-end architecture with Extended SMILES. MolNexTR combines ConvNext and ViT in a dual-stream encoder. SubGrapher takes a retrieval-oriented approach through visual fingerprinting of functional groups. This group also includes MolParser-7M, currently the largest OCSR dataset, and OCSU, which extends the task beyond structure prediction to multi-level molecular description.