A New Standard for Nanoinformatics
This is a Systematization paper that proposes a new standard: the NInChI. It addresses a fundamental limitation in nanoinformatics. The result of a collaborative workshop organized by the H2020 research infrastructure NanoCommons and the nanoinformatics project NanoSolveIT, this work uses six detailed case studies to systematically develop a hierarchical, machine-readable notation for complex nanomaterials that could work across experimental research, regulatory frameworks, and computational modeling.
The Breakdown of Traditional Chemical Identifiers
Chemoinformatics has fantastic tools for representing small molecules: SMILES strings, InChI identifiers, and standardized databases that make molecular data searchable and shareable. But when you step into nanotechnology, everything breaks down.
Consider trying to describe a gold nanoparticle with a silica shell and organic surface ligands. How do you capture:
- The gold core composition and size
- The silica shell thickness and interface
- The surface chemistry and ligand density
- The overall shape and morphology
There’s simply no standardized way to represent this complexity in a machine-readable format. This creates massive problems for:
- Data sharing between research groups
- Regulatory assessment where precise identification matters
- Computational modeling that needs structured input
- Database development and search capabilities
Without a standard notation, nanomaterials research suffers from the same data fragmentation that plagued small molecule chemistry before SMILES existed.
The Five-Tier Nanomaterial Description Hierarchy
The authors propose NInChI (Nanomaterials InChI), a layered extension to the existing InChI system. The core insight is organizing nanomaterial description from the inside out, following the OECD’s framework for risk assessment, with a five-tier hierarchy:
- Tier 1: Chemical Composition: What is the core made of? This differentiates uniform compositions (Tier 1.1), randomly mixed (Tier 1.2), ordered core-shell materials (Tier 1.3), and onion-like multi-shell morphologies (Tier 1.4).
- Tier 2: Morphology: What shape, size, and dimensionality? This encodes dimension (0D-3D), size and size distribution, and shape information.
- Tier 3: Surface Properties: Physical and chemical surface parameters such as charge, roughness, and hydrophobicity. Many of these depend on external conditions (pH, solvent, temperature).
- Tier 4: Surface Functionalization: How are coatings attached to the core? This includes functionalization density, orientation, and binding type (covalent vs. non-covalent).
- Tier 5: Surface Ligands: What molecules are on the surface, their density, orientation, and distribution?
This hierarchy captures the essential information needed to distinguish between different nanomaterials while building on familiar chemical concepts.
Testing the Standard: Six Case Studies
The authors tested their concept against six real-world case studies to identify what actually matters in practice.
Case Study 1: Gold Nanoparticles
Gold NPs provided a relatively simple test case: an inert metallic core with various surface functionalizations. Key insights: core composition and size are essential, surface chemistry (what molecules are attached) matters critically, shape affects properties, and dynamic properties like protein corona formation belong outside the intrinsic NInChI representation. This established the boundary: NInChI should capture intrinsic, stable properties.
Case Study 2: Graphene-Family NMs
Carbon nanotubes and graphene introduced additional complexity: dimensionality (1D tubes vs 2D sheets vs 0D fullerenes), chirality (the (n,m) vector that defines a nanotube’s structure), defects and impurities that can alter properties, and number of layers (for nanotubes, single-wall vs multi-wall). This case showed that the notation needed to handle both topological complexity and chemical composition.
Case Study 3: Complex Engineered (Doped and Multi-Metallic) NMs
Doped materials, alloys, and core-shell structures revealed key requirements: need to distinguish true alloys (homogeneous mixing) and core-shell structures with the same overall composition, crystal structure information becomes crucial, and component ratios must be precisely specified. The case study assessed whether the MInChI extension could represent these solid solutions.
Case Study 4: Database Applications
The FAIR (Findable, Accessible, Interoperable, Reusable) principles guided this analysis. NInChI addresses real database problems: it provides greater specificity than CAS numbers (which lack nanoform distinction), offers a systematic alternative to ad-hoc naming schemes, and enables machine-searchability.
Case Study 5: Computational Modeling
This explored several applications: automated descriptor generation from NInChI structure, read-across predictions for untested materials, and model input preparation from standardized notation. The layered structure provides structured input that computational tools need for both physics-based and data-driven nanoinformatics approaches.
Case Study 6: Regulatory Applications
Under frameworks like REACH, regulators need to distinguish between different “nanoforms”, which are materials with the same chemical composition but different sizes, shapes, or surface treatments. NInChI directly addresses this by encoding the specific properties that define regulatory categories, providing precision sufficient for legal definitions and risk assessment frameworks.
The NInChI Alpha Specification in Practice
Synthesizing insights from all six case studies, the authors propose the NInChI alpha specification (version 0.00.1A), a three-layer structure. Importantly, the paper distinguishes the five-tier NM description hierarchy (Section 1.2 above) from the three-layer NInChI notation hierarchy. NM properties from the five tiers are encoded into these three notation layers:
Layer 1 (Version Number): Standard header indicating the NInChI version, denoted as 0.00.1A for the alpha version. This follows the convention of all InChI-based notations.
Layer 2 (Composition): Each component (core, shell, ligands, impurities, dopants, linkers) gets described using standard InChI (or PInChI/MInChI) for chemical composition, with additional sublayers for morphology (prefix m, e.g., sp for sphere, sh for shell, tu for tube), size (prefix s, in scientific notation in meters), crystal structure (prefix k), and chirality (prefix w for carbon nanotubes). Components are separated by !.
Layer 3 (Arrangement): Specified with prefix y, this layer describes how the components from Layer 2 are combined, proceeding from inside out. A core-shell material is written as y2&1 where the numbers reference components in Layer 2. Covalent bonding between components is indicated with parentheses, e.g., (1&2&3) for a nano core with a covalently bound ligand coating.
The paper provides concrete worked examples from the case studies:
- Silica with gold coating (20 nm silica, 2 nm gold shell):
NInChI=0.00.1A/Au/msh/s2t10r1-9;12r2-9!/O2Si/c1-3-2/msp/s20d-9/k000/y2&1 - CTAB-capped gold nanoparticle (20 nm diameter):
NinChI=0.00.1A/Au/msp/s20d-9!C19H42N.BrH/c1-5-6-7.../y1&2 - Chiral single-wall nanotube of the (3,1) type with 0.4 nm diameter:
NInChI=0.00.1A/C/mtu/s4d-10/w(3,1)/y1
Property Prioritization: The case studies produced a prioritization of NM properties into four categories (Table 3 in the paper):
| Category 1: Must Have | Category 2a: Nice to Have | Category 2b: Extrinsic | Category 3: Out of Scope |
|---|---|---|---|
| Chemical composition | Structural defects | Surface charge | Optical properties |
| Size/size distribution | Density | Corona | Magnetic properties |
| Shape | Surface composition | Agglomeration state | Chemical/oxidation state |
| Crystal structure | Dispersion | ||
| Chirality | |||
| Ligand and ligand binding |
Implementation: The authors built a prototype NInChI generation tool using the ZK framework with a Java backend, available through the Enalos Cloud Platform. The tool lets users specify core composition, morphology, size, crystal structure, and chirality, then build outward by adding shells or clusters. InChIs for shell components are retrieved via the NCI/CADD chemical structure REST API.
Limitations: The alpha version acknowledges areas for future development: nanocomposite and nanostructured materials, inverse NMs (nano holes in bulk material), and nanoporous materials are beyond current scope. Dynamic properties such as dissolution, agglomeration, and protein corona formation are excluded. The stochastic nature of NMs (e.g., broad size distributions) is not yet fully addressed. Covalent bonding between components needs further refinement.
Impact: For researchers, NInChI enables precise structural queries for nanomaterials data sharing. For regulators, it provides systematic identification for risk assessment and nanoform classification under frameworks like REACH. For computational modelers, it enables automated descriptor generation and read-across predictions.
Key Conclusions: The 8-month collaborative process demonstrates that creating systematic notation for nanomaterials is feasible. The hierarchical, inside-out organization provides an approach that satisfies experimentalists, modelers, database owners, and regulators. Testing against six case studies identified the essential features that must be captured. By extending InChI and reusing conventions from MInChI, RInChI, and PInChI, the work builds on existing infrastructure. The proposed NInChI alpha is intended to stimulate further analysis and refinement with the broader community and the InChI Trust.
Reproducibility Details
- Paper Accessibility: The paper is fully open-access under the CC BY 4.0 license, allowing for straightforward reading and analysis.
- Tools & Code: The authors provided a prototype NInChI generation tool available through the Enalos Cloud Platform, built using the ZK framework with a Java backend. The underlying backend code was not released as an open-source library.
- Documentation: The paper serves as the first alpha specification for community discussion and refinement. No formal algorithmic pseudocode for automated string parsing or generation from structured nanomaterials files (like
.cif) is provided.
| Artifact | Type | License | Notes |
|---|---|---|---|
| NInChI Generator (Enalos Cloud) | Other | Unknown | Prototype web tool for generating NInChI strings; backend not open-source |
| Paper (MDPI) | Other | CC BY 4.0 | Open-access alpha specification |
Paper Information
Citation: Lynch, I., Afantitis, A., Exner, T., Himly, M., Lobaskin, V., Doganis, P., … & Melagraki, G. (2020). Can an InChI for Nano Address the Need for a Simplified Representation of Complex Nanomaterials across Experimental and Nanoinformatics Studies? Nanomaterials, 10(12), 2493. https://doi.org/10.3390/nano10122493
Publication: Nanomaterials (2020)
@article{lynch2020inchi,
title={Can an InChI for Nano Address the Need for a Simplified Representation of Complex Nanomaterials across Experimental and Nanoinformatics Studies?},
author={Lynch, Iseult and Afantitis, Antreas and Exner, Thomas and others},
journal={Nanomaterials},
volume={10},
number={12},
pages={2493},
year={2020},
publisher={MDPI},
doi={10.3390/nano10122493}
}
