SwinOCSR: Vision Transformers for Chemical OCR
Proposes an end-to-end architecture replacing standard CNN backbones with Swin Transformer to capture global image context. Introduces Multi-label Focal Loss to handle severe token imbalance in chemical datasets.








