Making InChI FAIR and Sustainable for Inorganic Chemistry

Paper Summary

Citation: Blanke, G., Brammer, J., Baljozovic, D., Khan, N. U., Lange, F., Bänsch, F., Tovee, C. A., Schatzschneider, U., Hartshorn, R. M., & Herres-Pawlis, S. (2025). Making the InChI FAIR and sustainable while moving to inorganics. Faraday Discussions, 256(0), 503–519. https://doi.org/10.1039/D4FD00145A

Publication: Faraday Discussions, 2025

Why This Matters

The International Chemical Identifier (InChI) is everywhere in chemistry databases—over a billion structures use it. But there’s a problem: the system was designed for organic chemistry and basically breaks when you give it anything with metals in it. This paper describes the major overhaul that fixed thousands of bugs and added support for inorganic and organometallic compounds.

If you’ve ever tried to search for a metal complex in a chemical database and gotten nonsense results, this is why. And this is the fix.

What InChI Actually Does

InChI creates a unique text string for any chemical structure. Unlike SMILES, which has multiple vendor implementations and can represent the same molecule in different ways, InChI is a single, standardized format controlled by IUPAC. The goal is simple: same molecule, same identifier, every time.

This matters for FAIR data principles:

Findable: You can search for a specific compound across databases
Accessible: The standard is open and free
Interoperable: Different systems can connect chemical knowledge
Reusable: The identifiers work consistently across platforms

The problem was that “same molecule, same identifier” only worked if your molecule didn’t have metals in it.

The v1.07 Overhaul

The new release represents a complete modernization:

Development moved to GitHub: No more closed development process
Thousands of bugs fixed: Including security issues that had been lurking for years
Better documentation: The code is now actually readable and maintainable
Preserved backward compatibility: Existing organic molecule InChIs didn’t change

The core canonicalization algorithm stayed the same—they weren’t about to break a billion existing identifiers. Instead, they added a preprocessing step that normalizes structures before they hit the main algorithm.

The Metal Problem

Here’s what was happening: InChI’s original algorithm assumed that bonds to metals were ionic and automatically disconnected them. This makes sense for something like sodium chloride (NaCl), where you really do have separate Na⁺ and Cl⁻ ions.

But it completely fails for:

Coordination complexes: Where ligands are definitely bonded to the metal center
Organometallic compounds: Where carbon-metal bonds are covalent
Sandwich compounds: Like ferrocene, where the bonding is neither purely ionic nor covalent

The result? You’d lose all the stereochemical information around the metal center, and compounds with completely different structures would get identical InChIs.

The Solution: Smart Preprocessing

The new system uses a decision tree to figure out which metal-ligand bonds to keep and which to disconnect:

Check if the metal is terminal: If it’s only connected to one thing and the electronegativity difference is huge, disconnect it (probably ionic)
Look at coordination number: If a non-terminal metal has more than a certain number of bonds for that element, keep them all connected
Apply chemical knowledge: The rules are based on actual coordination chemistry, not just electronegativity

This means FeCl₂ (probably ionic) gets disconnected into Fe²⁺ and 2 Cl⁻, while [FeCl₄]²⁻ (definitely a coordination complex) stays connected.

How InChI Generation Works

The process has six main steps:

Parse input: Read the structure from a file (Molfile, SDF, etc.)
Convert to internal format: Transform into the software’s data structures
Normalize: Standardize tautomers, resolve ambiguities—this is where the new metal rules kick in
Canonicalize: Create a unique representation independent of atom numbering
Generate InChI string: Build the layered text identifier
Create InChIKey: Hash the full string into a 27-character key for databases

The InChI itself has separate layers for formula, connectivity, hydrogens, stereochemistry, isotopes, and charge. The InChIKey is what actually gets stored in databases for fast searching.

Better Documentation

The technical manual is being split into two documents:

Chemical Manual: For chemists who need to understand what InChIs mean
Technical Manual: For developers who need to implement the algorithms

This is a smart move. The current documentation tries to serve both audiences and doesn’t do either particularly well.

What’s Still Missing

The paper acknowledges several areas for future work:

Better stereochemistry handling: Current representation is still limited
Mixtures (MInChI): For solutions, alloys, and other multi-component systems
Nanomaterials (NInChI): For particles, surfaces, and extended structures

These are hard problems. Chemical identifiers work best when you have discrete, well-defined molecular structures. Once you start dealing with mixtures or materials with variable composition, the whole concept of a “unique identifier” gets murky.

Impact on Chemical Databases

This update should dramatically improve the searchability of inorganic and organometallic compounds in major chemical databases. Instead of getting disconnected fragments when you search for a metal complex, you’ll actually get the compound you’re looking for.

For computational chemistry workflows that rely on database lookups—which is most of them—this represents a significant practical improvement.

The Bigger Picture

InChI’s evolution reflects chemistry’s expansion beyond its organic roots. The fact that it took this long to properly handle inorganic compounds shows how much computational chemistry has historically focused on carbon-based molecules.

As the field moves into catalysis, materials science, and coordination chemistry applications, having proper chemical identifiers becomes essential. You can’t build FAIR chemical databases if half of chemistry is represented incorrectly.

Content Details
Category	Computational Chemistry
Date	October 2025

Paper Summary#

Why This Matters#

What InChI Actually Does#

The v1.07 Overhaul#

The Metal Problem#

The Solution: Smart Preprocessing#

How InChI Generation Works#

Better Documentation#

What’s Still Missing#

Impact on Chemical Databases#

The Bigger Picture#