Abstract

We introduce the TABME++ benchmark for evaluating Large Language Model performance on page stream segmentation tasks. Our comprehensive evaluation demonstrates that decoder-based models achieve superior performance when combined with parameter-efficient fine-tuning approaches.

Key Contributions

  • TABME++ Benchmark: New evaluation framework for page stream segmentation
  • Comprehensive LLM Evaluation: Systematic comparison of different model architectures
  • Parameter-Efficient Fine-Tuning: Demonstration of effective adaptation strategies
  • Performance Analysis: Detailed analysis showing decoder-based model superiority

Technical Innovation

Our work introduces novel evaluation metrics specifically designed for page stream segmentation and provides the first comprehensive comparison of LLM architectures on this task. The benchmark is designed to be extensible and reproducible.

Impact

This benchmark provides the research community with standardized evaluation tools for document processing tasks and demonstrates practical approaches for applying LLMs to real-world document automation challenges.

Citation

@article{heidenreich2024large,
  title={Large Language Models for Page Stream Segmentation},
  author={Heidenreich, Hunter and Dalvi, Ratish and Mukku, Rohith and Verma, Nikhil and Pičuljan, Neven},
  journal={arXiv preprint arXiv:2408.11981},
  year={2024}
}