A Method Paper on Practical Graph Isomorphism

This is a Method paper that brings the published description of nauty (version 2.5) up to date and introduces Traces (version 2.0), a new program for graph isomorphism testing and canonical labeling. The paper provides a unified theoretical framework for the individualization-refinement paradigm that underpins all leading graph isomorphism programs, then details the distinct implementation strategies of nauty and Traces. Extensive benchmarks compare both programs against saucy, Bliss, and conauto across graph families ranging from easy to extremely difficult.

The Graph Isomorphism Problem in Practice

An isomorphism between two graphs is a bijection between their vertex sets that preserves adjacency. The graph isomorphism problem (GI) asks whether such a bijection exists. While GI is in NP, it is neither known to be in co-NP nor proven NP-complete. NP-completeness is considered unlikely, as it would imply collapse of the polynomial-time hierarchy. The best proven worst-case running time has stood for three decades at $e^{O(\sqrt{n \log n})}$.

In practice, direct isomorphism testing is poorly suited for common tasks like removing duplicates from large graph collections or looking up graphs in databases. The standard approach is canonical labeling: relabeling a graph so that isomorphic graphs become identical after relabeling. This allows sorting algorithms and standard data structures to handle isomorph rejection and retrieval.

The dominant practical approach is the individualization-refinement paradigm, introduced by Parris and Read (1969) and developed by Corneil and Gotlieb (1970). McKay’s nauty (1978, 1980) was the first program to handle both structurally regular graphs with hundreds of vertices and graphs with large automorphism groups. Its key innovation was using discovered automorphisms to prune the search tree. nauty dominated the field for decades until competitors like saucy (2004), Bliss (2007), and conauto (2009) introduced sparse data structures, early refinement abort, and other improvements.

The Individualization-Refinement Framework

The paper provides a general formal framework encompassing all leading graph isomorphism algorithms. The core idea has three components: vertex colorings, a search tree built by individualizing vertices, and pruning via node invariants and automorphisms.

Colorings and Refinement

A colouring of vertex set $V$ is a surjective function $\pi: V \to {1, 2, \ldots, k}$. A colouring is equitable if any two vertices of the same colour are adjacent to the same number of vertices of each colour. Given any colouring $\pi$, there exists a unique coarsest equitable colouring $\pi’$ with $\pi’ \preceq \pi$ (meaning $\pi’$ is finer than or equal to $\pi$). Computing this equitable refinement is the primary computational bottleneck.

Individualization gives a single vertex a unique colour, then refines:

$$ I(\pi, v)(w) = \begin{cases} \pi(w), & \text{if } \pi(w) < \pi(v) \text{ or } w = v \\ \pi(w) + 1, & \text{otherwise} \end{cases} $$

The refinement function $R(G, \pi_0, \nu)$ applies equitable refinement after each individualization step for a sequence of vertices $\nu = (v_1, v_2, \ldots)$.

Search Tree and Canonical Forms

The search tree $\mathcal{T}(G, \pi_0)$ is a rooted tree whose nodes are vertex sequences. Starting from the empty sequence at the root, each node extends the sequence by choosing a vertex from a target cell (a non-singleton cell of the current colouring). Leaves correspond to discrete colourings (permutations of $V$).

A canonical form is a function $C: \mathcal{G} \times \Pi \to \mathcal{G} \times \Pi$ satisfying:

  • $C(G, \pi) \cong (G, \pi)$ (the canonical form is isomorphic to the input)
  • $C(G^g, \pi^g) = C(G, \pi)$ for all $g \in S_n$ (label-invariance)

The canonical form is computed by finding the leaf $\nu^*$ maximizing the node invariant $\phi(G, \pi_0, \nu)$, then applying the corresponding discrete colouring.

Tree Pruning

Three pruning operations keep the search tractable:

  • $P_A(\nu, \nu’)$: Remove subtree at $\nu’$ if $\phi(G, \pi_0, \nu) > \phi(G, \pi_0, \nu’)$ (invariant comparison)
  • $P_B(\nu, \nu’)$: Remove subtree at $\nu’$ if $\phi(G, \pi_0, \nu) \neq \phi(G, \pi_0, \nu’)$ (inequivalence)
  • $P_C(\nu, g)$: Remove subtree at $\nu^g$ if $g \in \text{Aut}(G, \pi_0)$ and $\nu < \nu^g$ (automorphism pruning)

Theorem 5 in the paper guarantees that after any sequence of these pruning operations, at least one canonical leaf survives and the discovered automorphisms generate the full automorphism group.

Implementation: nauty vs. Traces

While both programs operate within the same individualization-refinement framework, their implementation strategies differ substantially.

Refinement Strategies

Both nauty and Traces compute equitable colourings using Algorithm 1, which iteratively splits cells based on adjacency counts. For regular graphs (where all vertices have equal degree), the initial colouring is trivially equitable, making these graphs difficult. nauty addresses this with a library of stronger partitioning functions (e.g., triangle counting), which require user expertise to select. Traces instead uses a richer node invariant that often makes stronger refinements unnecessary.

Target Cell Selection

nauty has two strategies: using the first non-singleton cell regardless of size, or choosing the first cell with the most non-trivial joins to other cells (where a non-trivial join means more than 0 edges and less than the maximum possible between two cells). An earlier version of nauty preferred the smallest non-singleton cell, hypothesizing it would more likely correspond to a group orbit, but experiments showed the first non-singleton cell performs better in most cases. Traces prefers large target cells, which produce shallower search trees. Specifically, Traces selects the first largest non-singleton cell that is a subset of the parent node’s target cell. If no non-singleton cells satisfy this, it falls back to the grandparent node’s target cell, and so on.

Node Invariants: The Trace

The most consequential difference is in node invariants. nauty computes a single integer $f(\nu)$ at each node, forming a vector $(f([\nu]_0), f([\nu]_1), \ldots, f(\nu))$ for lexicographic comparison. Traces defines $f(\nu)$ as a vector encoding the sizes and positions of cells in the order they are created during refinement. This vector-of-vectors structure (the “trace,” hence the program’s name) enables comparison while refinement is still incomplete. For many difficult graph families, only a fraction of refinement operations need to finish before pruning can occur.

Tree Scanning Order

This is the fundamental architectural difference. nauty uses depth-first search, keeping the lexicographically least leaf $\nu_1$ and the leaf $\nu^*$ with the greatest invariant discovered so far. Pruning applies when a node’s invariant matches neither.

Traces uses breadth-first search, processing all nodes at each level $k$ and retaining only those with the greatest invariant value. By property $(\phi 1)$, the best nodes at level $k$ are children of the best nodes at level $k-1$, so no backtracking is needed. This maximizes pruning operation $P_A$.

To compensate for the fact that breadth-first search delays automorphism discovery (which requires leaves), Traces generates experimental paths: random paths from each node down to a leaf. Random experimental paths tend to find automorphisms generating larger subgroups, making more of the group available early for pruning. Both programs maintain discovered automorphisms using the random Schreier method for efficient orbit computation.

Low-Degree Vertex Handling

Traces includes special handling for vertices of degree 0, 1, 2, or $n-1$. After the initial refinement, vertices with equal colours also have equal degrees. The target cell selector never selects cells containing vertices of these low degrees, and nodes whose non-trivial cells consist only of such vertices are not expanded further. Instead, special-purpose code produces generators for the automorphism group fixed by that node and, if needed, a unique discrete colouring. This technique is effective for graphs with many small components and tree-like structures (as in constraint satisfaction problems), though the authors note that such graphs could also benefit from preprocessing that factors out tree-like appendages and replaces vertices with identical neighborhoods.

Automorphism Detection

Beyond leaf comparison, saucy introduced early detection of automorphisms higher in the search tree by checking whether partial mappings between equivalent colourings extend trivially. Traces extends this idea with a heuristic that attempts non-trivial extensions. When computing only the automorphism group (not canonical labeling), Traces employs a strategy where it finds all discrete children of one node and then checks each remaining node for a single matching discrete child, further reducing search effort.

Performance Benchmarks

The authors compare nauty 2.5, Traces 2.0, saucy 3.0, Bliss 7.2, and conauto 2.0.1 on a MacBook Pro with a 2.66 GHz Intel i7 processor. All graphs were randomly labeled before processing to avoid artifacts from input ordering. The benchmark covers both automorphism group computation and canonical labeling.

Graph FamilyBest Program(s)Notes
Random graphs ($p = 1/2$)nauty, TracesAll programs fast; easy class
Random graphs ($p = n^{-1/2}$)nautySparse random graphs
Random cubic graphsnauty (with invariant)nauty benefits from distance invariant
HypercubesTracesVertex-transitive; Traces dramatically faster
Misc. vertex-transitiveTracesLarge automorphism groups
Unions of tripartite graphsconauto, BlissSpecial handling for disjoint components
Small strongly-regular graphsTraces, nautyBoth competitive
Large strongly-regular graphsTracesOrders of magnitude faster
Hadamard matrix graphsTracesAmong the hardest known classes
Random treesnautyLow-degree preprocessing helps
Cai-Furer-Immerman graphsTracesDesigned to defeat refinement; Traces still efficient
Miyazaki graphsTracesAnother hard class; dramatic advantage
Projective planes (order 16)TracesLarge automorphism groups on bipartite graphs
Combinatorial graphsMixedPerformance varies by instance; Traces generally competitive

The results show that nauty is generally fastest for small graphs and some easier families, while Traces dominates on most difficult graph classes, sometimes by orders of magnitude. The breadth-first tree scanning strategy of Traces, combined with its richer node invariant, provides the largest gains on graphs with complex symmetry structure (strongly-regular graphs, Hadamard matrix graphs, vertex-transitive graphs). The exception is graph families with many disjoint or minimally-overlapping components, where conauto and Bliss have specialized handling that nauty and Traces lack.

Key Findings and Limitations

The paper establishes several findings:

  1. The breadth-first tree scanning approach in Traces, combined with experimental paths for early automorphism discovery, provides large efficiency gains on difficult graph classes.
  2. Traces’ richer node invariant (the trace) enables early pruning during incomplete refinement, reducing dependence on user-selected invariant functions compared to nauty.
  3. No single program dominates all graph classes. nauty remains preferred for mass processing of small graphs.
  4. The random Schreier method for maintaining the automorphism group is effective in both programs, enabling more complete pruning via orbit computation.

Limitations acknowledged by the authors include: nauty and Traces lack specialized code for graphs consisting of disjoint or minimally-overlapping components (where conauto and Bliss excel), and the choice of refinement function in nauty still requires user expertise for certain difficult graph classes.


Reproducibility Details

Data

PurposeDatasetSizeNotes
BenchmarkingBliss test collectionMultiple familiesGraphs ranging from easy to very difficult
Benchmarkingnauty/Traces website collectionMultiple familiesAll test graphs available at the project website

All test graphs are publicly available at the nauty and Traces website. Graphs were randomly labeled before processing to avoid non-typical behavior from input labeling.

Algorithms

The core algorithms are described formally with proofs of correctness (Theorem 5 guarantees pruning validity). Key implementation choices:

  • Refinement: Equitable colouring via Algorithm 1 (iterated cell splitting by adjacency counts)
  • Target cell selection: nauty uses first non-singleton or most non-trivially joined cell; Traces uses first largest cell within parent’s target
  • Tree scanning: nauty uses depth-first; Traces uses breadth-first with experimental paths
  • Group maintenance: Random Schreier method for orbit computation in both programs

Software

ProgramVersionCanonical LabelingOpen Source
nauty2.5YesYes
Traces2.0YesYes
saucy3.0No (v3.0)Yes
Bliss7.2YesYes
conauto2.0.1NoYes

Artifacts

ArtifactTypeLicenseNotes
nauty and TracesCodeApache 2.0Official distribution (v2.9.3 as of 2026); includes gtools graph utilities
Test graphsDatasetApache 2.0All benchmark graphs from the paper, available at the project website

Hardware

Benchmarks run on a MacBook Pro with 2.66 GHz Intel i7 processor, compiled with gcc 4.7, single-threaded execution.


Paper Information

Citation: McKay, B. D., & Piperno, A. (2013). Practical graph isomorphism, II. Journal of Symbolic Computation, 60, 94-112.

@article{mckay2013practical,
  title={Practical graph isomorphism, {II}},
  author={McKay, Brendan D. and Piperno, Adolfo},
  journal={Journal of Symbolic Computation},
  volume={60},
  pages={94--112},
  year={2013},
  publisher={Elsevier BV},
  doi={10.1016/j.jsc.2013.09.003}
}