Paper Summary
Citation: Heller, S., McNaught, A., Stein, S., Tchekhovskoi, D., & Pletnev, I. (2013). InChI - the worldwide chemical structure identifier standard. Journal of Cheminformatics, 5(1), 7. https://doi.org/10.1186/1758-2946-5-7
Publication: Journal of Cheminformatics, 2013
What is this paper about?
This paper is a comprehensive overview of the IUPAC International Chemical Identifier (InChI) standard as it stood in 2013, roughly eight years after its initial release. The authors—who include key figures from the InChI development team—provide both a technical description of how InChI works and a behind-the-scenes look at the organizational structure that maintains it.
The paper serves dual purposes: explaining the technical aspects of InChI for newcomers, and documenting the governance model that ensures InChI remains a truly open, non-proprietary standard rather than becoming controlled by any single commercial entity.
The Problem InChI Solved
Before InChI, the chemistry community faced a fundamental interoperability problem. Chemical databases used proprietary systems like CAS Registry Numbers, or format-dependent representations like SMILES strings. This meant that linking chemical data across different databases required expensive licensing or format conversions that often lost information.
The internet era made this problem acute—suddenly there were vast amounts of chemical data online, but no universal way to link a molecule mentioned in one database to the same molecule in another. InChI was designed to be that universal link: a free, standardized identifier that any database could generate and use.
How InChI Works
InChI represents molecules as hierarchical text strings, similar to how a postal address goes from general (country) to specific (house number). The key innovation is its layered structure:
- Chemical formula layer: Basic atom counts
- Connectivity layer: How atoms connect (the molecular skeleton)
- Stereochemistry layer: 3D spatial arrangements
- Isotope layer: Specific isotopic compositions
This layered approach is clever: a molecule with unknown stereochemistry will have an InChI that’s a subset of the same molecule with known stereochemistry. This allows for flexible matching—you can find molecules that match at the connectivity level even if you don’t have complete stereochemical information.
The InChIKey is the practical workhorse—a 27-character compressed version designed for database indexing and web searches. The first 14 characters encode molecular connectivity, while the remaining characters handle stereochemistry and other details.
The Governance Model
What makes this paper particularly valuable is its detailed explanation of how InChI is governed. The technical oversight lives within IUPAC’s Division VIII, but the actual development work is distributed across specialized working groups focusing on areas like polymers, inorganics, and tautomerism.
The InChI Trust, established in 2009, handles the business side—ensuring long-term funding and preventing any single organization from controlling the standard. This was a critical innovation: getting commercial publishers and software vendors to agree that a non-proprietary standard would benefit everyone, rather than trying to lock in users with proprietary formats.
Honest Assessment of Limitations
The authors are refreshingly candid about InChI’s limitations in 2013:
- Round-trip conversion: Converting InChI back to a chemical structure fails about 1% of the time without auxiliary information
- Tautomer handling: Different tautomeric forms of the same molecule could generate different InChIs
- Rare collision cases: Theoretically, different molecules could produce identical InChIs
These aren’t showstoppers, but they’re the kind of edge cases that matter in large-scale chemical informatics applications. The authors note that Version 2 of InChI (which was in development at the time) aimed to address many of these issues.
Why This Matters
InChI succeeded where other standardization efforts failed because it solved both the technical problem (how to represent molecules unambiguously) and the political problem (how to maintain an open standard in a competitive industry).
The certification program they describe—where software implementations can be tested against a standard suite and earn an “InChI Certified” designation—shows attention to the practical details that make standards actually work in practice.
For anyone working in computational chemistry or chemical informatics, InChI is now so ubiquitous that it’s easy to take for granted. This paper provides valuable historical context for how we got from the fragmented landscape of the early 2000s to today’s more interoperable chemical databases.
Current Status
While this paper is from 2013, InChI continues to evolve. The issues with tautomers and round-trip conversion have been largely addressed in newer versions, and InChI has become even more widely adopted across chemical databases and publishing platforms. The governance model described here has proven durable—InChI remains truly open and non-proprietary more than a decade later.