CLEF-IP 2012 Benchmark Overview

Paper Information

Citation: Piroi, F., Lupu, M., Hanbury, A., Magdy, W., Sexton, A. P., & Filippov, I. (2012). CLEF-IP 2012: Retrieval Experiments in the Intellectual Property Domain. CLEF 2012, LNCS 7488.

Publication: CLEF 2012

Patent Retrieval and the CLEF-IP 2012 Benchmark

This is a Resource paper (benchmark infrastructure). It establishes a standardized test bed for the Intellectual Property (IP) Information Retrieval community by defining tasks, curating datasets (topics and relevance judgments), and establishing evaluation protocols. The paper does not propose a new method itself but aggregates and analyzes the performance of participant systems on these shared tasks.

Motivation for Standardized IP Information Retrieval

The volume of patent applications is increasing rapidly, necessitating automated methods to help patent experts find prior art and classify documents.

Economic Impact: Thorough searches are critical due to the high economic value of granted patents.
Complexity: Patent work-flows are specific; examiners need to find prior art for specific claims alongside whole documents, and often rely on non-textual data like flowcharts and chemical diagrams.
Gap: Existing general IR tools are insufficient for the specific granularity (passages, images, structures) required in the IP domain.

The 2012 edition of the lab introduced three specific tasks targeting different modalities of patent data:

Passage Retrieval starting from Claims: Moving beyond document-level retrieval to identifying specific relevant passages based on claim text.
Flowchart Recognition: A new image analysis task requiring the extraction of structural information (nodes, edges, text) from patent images.
Chemical Structure Recognition: A dual task of segmenting molecular diagrams from full pages and recognizing them into structural files (MOL), specifically addressing the challenge of Markush structures in patents.

Benchmarking Setup and Evaluation

The “experiments” were the benchmarking tasks themselves, performed by participants (e.g., University of Birmingham, SAIC, TU Vienna).

Passage Retrieval: Participants retrieved documents and passages for 105 test topics (sets of claims) from a corpus of 1.5 million patents. Performance was measured using PRES, Recall, and MAP at the document level, and AP/Precision at the passage level.
Flowchart Recognition: Participants extracted graph structures from 100 test images. Evaluation compared the submitted graphs to ground truth using a distance metric based on the Maximum Common Subgraph (MCS).
Chemical Structure:
- Segmentation: Identifying bounding boxes of chemical structures in 30 multipage TIFF patents.
- Recognition: Converting 865 “automatic” (standard MOL) and 95 “manual” (Markush/complex) diagrams into structure files.

Key Findings and Baseline Results

Passage Retrieval: Approaches varied from two-step retrieval (document then passage) to full NLP techniques. Translation tools were universally used due to the multilingual corpus (English, German, French).
Chemical Recognition: The best performing system (UoB) achieved approximately 91% recall on total structures. The manual evaluation highlighted a critical need for standards extending MOL files to support Markush structures, which are common in patents but poorly supported by current tools.
Flowchart Recognition: The evaluation required a combination of structural matching and edit-distance for text labels because OCR outputs rarely “hard-matched” the gold standard.

Reproducibility Details

Data

The collection focuses on European Patent Office (EPO) and WIPO documents published up to 2002.

1. Passage Retrieval Data

Corpus: >1.5 million XML patent documents (EP and WO sources).
Training Set: 51 topics (sets of claims) with relevance judgments (18 DE, 21 EN, 12 FR).
Test Set: 105 topics (35 per language).
Topic Source: Extracted manually from search reports listing “X” or “Y” citations (highly relevant prior art).

2. Flowchart Data

Format: Black and white TIFF images.
Training Set: 50 images with textual graph representations.
Test Set: 100 images.
Ground Truth: A defined textual format describing nodes (NO), directed edges (DE), undirected edges (UE), and meta-data (MT).

3. Chemical Structure Data

Segmentation: 30 patent files rendered as 300dpi monochrome multipage TIFFs.
Recognition (Automatic Set): 865 diagram images fully representable in standard MOL format.
Recognition (Manual Set): 95 diagram images containing Markush structures or variability not supported by standard MOL.

Algorithms

Ground Truth Generation:

Qrels Generator: An in-house tool was used to manually map search report citations to specific XML passages (XPaths) for the passage retrieval task.
McGregor Algorithm: Used for the flowchart evaluation to compute the Maximum Common Subgraph (MCS) between participant submissions and ground truth.

Evaluation

Passage Retrieval Metrics:

Document Level: PRES (Patent Retrieval Evaluation Score), Recall, MAP. Cut-off at 100 documents.
Passage Level: $AP(D)$ (Average Precision at document level) and $Precision(D)$ (Precision at document level), averaged across all relevant documents for a topic.

Flowchart Recognition Metric:

Graph Distance ($d$): Defined quantitatively based on the Maximum Common Subgraph (MCS) between a target flowchart ($F_t$) and a submitted flowchart ($F_s$): $$ \begin{aligned} d(F_t, F_s) &= 1 - \frac{|mcs(F_t, F_s)|}{|F_t| + |F_s| - |mcs(F_t, F_s)|} \end{aligned} $$ where $|F|$ represents the size of the graph (nodes + edges).
Levels: Evaluated at three levels: Basic (structure only), Intermediate (structure + node types), and Complete (structure + types + text labels).

Chemical Structure Metrics:

Segmentation: Precision, Recall, and $F_1$ based on bounding box matches. A match is valid if borders are within a tolerance (0 to 55 pixels).
Recognition:
- Automatic: Comparison of InChI strings generated by Open Babel.
- Manual: Visual comparison of images rendered by MarvinView.

Citation

@inproceedings{piroi2012clefip,
  title={CLEF-IP 2012: Retrieval Experiments in the Intellectual Property Domain},
  author={Piroi, Florina and Lupu, Mihai and Hanbury, Allan and Magdy, Walid and Sexton, Alan P. and Filippov, Igor},
  booktitle={CLEF 2012},
  series={Lecture Notes in Computer Science},
  volume={7488},
  year={2012},
  publisher={Springer}
}

Paper Information#

Patent Retrieval and the CLEF-IP 2012 Benchmark#

Motivation for Standardized IP Information Retrieval#

Novel Multi-modal Tasks: Claims, Flowcharts, and Chemicals#

Benchmarking Setup and Evaluation#

Key Findings and Baseline Results#

Reproducibility Details#

Data#

Algorithms#

Evaluation#

Citation#