Paper Summary
Citation: Blanke, G., Brammer, J., Baljozovic, D., Khan, N. U., Lange, F., Bänsch, F., Tovee, C. A., Schatzschneider, U., Hartshorn, R. M., & Herres-Pawlis, S. (2025). Making the InChI FAIR and sustainable while moving to inorganics. Faraday Discussions, 256(0), 503–519. https://doi.org/10.1039/D4FD00145A
Publication: Faraday Discussions, 2025
Why This Matters
The International Chemical Identifier (InChI) is everywhere in chemistry databases—over a billion structures use it. But there’s a problem: the system was designed for organic chemistry and basically breaks when you give it anything with metals in it. This paper describes the major overhaul that fixed thousands of bugs and added support for inorganic and organometallic compounds.
If you’ve ever tried to search for a metal complex in a chemical database and gotten nonsense results, this is why. And this is the fix.
What InChI Actually Does
InChI creates a unique text string for any chemical structure. Unlike SMILES, which has multiple vendor implementations and can represent the same molecule in different ways, InChI is a single, standardized format controlled by IUPAC. The goal is simple: same molecule, same identifier, every time.
This matters for FAIR data principles:
- Findable: You can search for a specific compound across databases
- Accessible: The standard is open and free
- Interoperable: Different systems can connect chemical knowledge
- Reusable: The identifiers work consistently across platforms
The problem was that “same molecule, same identifier” only worked if your molecule didn’t have metals in it.
The v1.07 Overhaul
The new release represents a complete modernization:
- Development moved to GitHub: No more closed development process
- Thousands of bugs fixed: Including security issues that had been lurking for years
- Better documentation: The code is now actually readable and maintainable
- Preserved backward compatibility: Existing organic molecule InChIs didn’t change
The core canonicalization algorithm stayed the same—they weren’t about to break a billion existing identifiers. Instead, they added a preprocessing step that normalizes structures before they hit the main algorithm.
The Metal Problem
Here’s what was happening: InChI’s original algorithm assumed that bonds to metals were ionic and automatically disconnected them. This makes sense for something like sodium chloride (NaCl), where you really do have separate Na⁺ and Cl⁻ ions.
But it completely fails for:
- Coordination complexes: Where ligands are definitely bonded to the metal center
- Organometallic compounds: Where carbon-metal bonds are covalent
- Sandwich compounds: Like ferrocene, where the bonding is neither purely ionic nor covalent
The result? You’d lose all the stereochemical information around the metal center, and compounds with completely different structures would get identical InChIs.
The Solution: Smart Preprocessing
The new system uses a decision tree to figure out which metal-ligand bonds to keep and which to disconnect:
Check if the metal is terminal: If it’s only connected to one thing and the electronegativity difference is huge, disconnect it (probably ionic)
Look at coordination number: If a non-terminal metal has more than a certain number of bonds for that element, keep them all connected
Apply chemical knowledge: The rules are based on actual coordination chemistry, not just electronegativity
This means FeCl₂
(probably ionic) gets disconnected into Fe²⁺ and 2 Cl⁻, while [FeCl₄]²⁻
(definitely a coordination complex) stays connected.
How InChI Generation Works
The process has six main steps:
- Parse input: Read the structure from a file (Molfile, SDF, etc.)
- Convert to internal format: Transform into the software’s data structures
- Normalize: Standardize tautomers, resolve ambiguities—this is where the new metal rules kick in
- Canonicalize: Create a unique representation independent of atom numbering
- Generate InChI string: Build the layered text identifier
- Create InChIKey: Hash the full string into a 27-character key for databases
The InChI itself has separate layers for formula, connectivity, hydrogens, stereochemistry, isotopes, and charge. The InChIKey is what actually gets stored in databases for fast searching.
Better Documentation
The technical manual is being split into two documents:
- Chemical Manual: For chemists who need to understand what InChIs mean
- Technical Manual: For developers who need to implement the algorithms
This is a smart move. The current documentation tries to serve both audiences and doesn’t do either particularly well.
What’s Still Missing
The paper acknowledges several areas for future work:
- Better stereochemistry handling: Current representation is still limited
- Mixtures (MInChI): For solutions, alloys, and other multi-component systems
- Nanomaterials (NInChI): For particles, surfaces, and extended structures
These are hard problems. Chemical identifiers work best when you have discrete, well-defined molecular structures. Once you start dealing with mixtures or materials with variable composition, the whole concept of a “unique identifier” gets murky.
Impact on Chemical Databases
This update should dramatically improve the searchability of inorganic and organometallic compounds in major chemical databases. Instead of getting disconnected fragments when you search for a metal complex, you’ll actually get the compound you’re looking for.
For computational chemistry workflows that rely on database lookups—which is most of them—this represents a significant practical improvement.
The Bigger Picture
InChI’s evolution reflects chemistry’s expansion beyond its organic roots. The fact that it took this long to properly handle inorganic compounds shows how much computational chemistry has historically focused on carbon-based molecules.
As the field moves into catalysis, materials science, and coordination chemistry applications, having proper chemical identifiers becomes essential. You can’t build FAIR chemical databases if half of chemistry is represented incorrectly.