This section covers large language models and vision-language models applied to chemistry. These differ from chemical language models (ChemBERTa, MoLFormer, etc.) in that they build on general-purpose LLM or VLM backbones rather than learning representations directly from molecular string notations. Topics include multimodal models integrating molecular graphs, images, or spectra with text (ChemVLM, ChemDFM-X, InstructMol), chemical reasoning LLMs (ChemDFM-R), and systems for extracting or retrieving chemical information from scientific literature (MERMaid).