Paper Summary
Citation: Henze, H. R., & Blair, C. M. (1931). The number of isomeric hydrocarbons of the methane series. Journal of the American Chemical Society, 53(8), 3077-3085. https://doi.org/10.1021/ja01359a034
Publication: Journal of the American Chemical Society (JACS) 1931
What kind of paper is this?
This is a foundational theoretical paper in mathematical chemistry and chemical graph theory. Rather than proposing an approximation or empirical formula, it derives exact mathematical laws governing molecular topology. The paper also serves as a benchmark resource, establishing validated isomer counts that corrected historical errors and remain the gold standard for molecular enumeration.
What is the motivation?
The primary motivation was the lack of a rigorous mathematical relationship between carbon content ($N$) and isomer count.
- Previous failures: Earlier attempts by Cayley and Schiff used “centric” symmetry tree methods that failed for $N > 13$.
- The theoretical gap: Existing formulas were empirical or limited (e.g., adding correction terms for each unit increase in $N$), meaning they could not reliably predict counts for larger molecules like $C_{40}$.
This work aimed to develop a theoretically sound, generalizable method that could be extended to any number of carbons.
What is the novelty here?
The core novelty is the proof that no direct function $f(N)$ exists. Instead, the count of hydrocarbons is a recursive function of the count of alkyl radicals (alcohols) of size $N/2$ or smaller.
The Symmetry Constraints: The paper rigorously divides the problem space to prevent double-counting:
- Group A (Centrosymmetric): Hydrocarbons that can be bisected into two smaller alkyl radicals.
- Even $N$: Split into two radicals of size $N/2$.
- Odd $N$: Split into sizes $(N+1)/2$ and $(N-1)/2$.
- Group B (Asymmetric): Hydrocarbons that cannot be bisected.
- Defined by a central node with 3 or 4 branches, where no branch is larger than $(N/2 - 1)$.
This classification is the key insight that enables the recursive formulas. By exhaustively partitioning hydrocarbons into these mutually exclusive groups, the authors could derive separate combinatorial expressions for each and sum them without double-counting.
For each structural class, combinatorial formulas are derived that depend on the number of isomeric alcohols ($T_k$) where $k < N$. This transforms the problem of counting large molecular graphs into a recurrence relation based on the counts of smaller, simpler sub-graphs.
What experiments were performed?
The work is theoretical, so the “experiments” were computational and enumerative:
- Derivation of the recursion formulas: The main effort was the mathematical derivation of the set of equations for each structural class of hydrocarbon.
- Calculation: They applied their formulas to calculate the number of isomers for alkanes up to $N=40$, reaching over $6.2 \times 10^{13}$ isomers. This was far beyond what was previously possible.
- Validation by exhaustive enumeration: To prove the correctness of their theory, the authors manually drew and counted all possible structural formulas for the undecanes ($C_{11}$), dodecanes ($C_{12}$), tridecanes ($C_{13}$), and tetradecanes ($C_{14}$). This brute-force check confirmed their calculated numbers and corrected long-standing errors in the literature.
- Key correction: The manual enumeration proved that the count for tetradecane ($C_{14}$) was 1,858, not 1,855 as previously cited by Losanitsch.
What were the outcomes and conclusions drawn?
- Theoretical outcome: The paper proves that the problem’s inherent complexity requires a recursive approach. There is no simple, direct formula relating the number of isomers to $N$.
- Benchmark resource: The authors published a table of validated isomer counts up to $C_{40}$ (Table II), establishing the definitive ground truth for molecular isomers and correcting historical errors.
The plot above illustrates the staggering growth rate. While methane ($C_1$) through propane ($C_3$) each have exactly one isomer, the count accelerates rapidly: 75 isomers at $C_{10}$, nearly 37 million at $C_{25}$, and over 4 billion at $C_{30}$. By $C_{40}$, the count exceeds $6.2 \times 10^{13}$. This super-exponential scaling demonstrates why brute-force enumeration becomes impossible and why the recursive approach was essential.
- Foundational impact: This work established the mathematical framework that would later evolve into modern chemical graph theory and computational chemistry approaches for molecular enumeration. In the context of AI for molecular generation, this is an early form of expressivity analysis, defining the size of the chemical space that generative models must learn to cover.
