Image-to-graph methods bypass string representations entirely, predicting the molecular graph (atoms as nodes, bonds as edges) directly from the input image. This family includes segmentation-based approaches like ChemGrapher and Staker et al.’s U-Net pipeline, keypoint-detection architectures like ABC-Net, and joint atom-bond-coordinate predictors like MolScribe. By reasoning about spatial structure rather than linearizing it, these models tend to handle stereochemistry and abbreviated groups more naturally than sequence-based alternatives. Full-pipeline systems like MolMiner and MolMole extend the approach to page-level chemical extraction from documents.