<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Graph Theory &amp; Algorithms on Hunter Heidenreich | ML Research Scientist</title><link>https://hunterheidenreich.com/notes/interdisciplinary/graph-theory/</link><description>Recent content in Graph Theory &amp; Algorithms on Hunter Heidenreich | ML Research Scientist</description><image><title>Hunter Heidenreich | ML Research Scientist</title><url>https://hunterheidenreich.com/img/avatar.webp</url><link>https://hunterheidenreich.com/img/avatar.webp</link></image><generator>Hugo -- 0.147.7</generator><language>en-US</language><copyright>2026 Hunter Heidenreich</copyright><lastBuildDate>Sat, 11 Apr 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://hunterheidenreich.com/notes/interdisciplinary/graph-theory/index.xml" rel="self" type="application/rss+xml"/><item><title>nauty and Traces: Graph Isomorphism Algorithms</title><link>https://hunterheidenreich.com/notes/interdisciplinary/graph-theory/nauty-traces-graph-isomorphism/</link><pubDate>Sat, 11 Apr 2026 00:00:00 +0000</pubDate><guid>https://hunterheidenreich.com/notes/interdisciplinary/graph-theory/nauty-traces-graph-isomorphism/</guid><description>nauty and Traces use individualization-refinement with search tree pruning for graph isomorphism testing and canonical labeling.</description><content:encoded><![CDATA[<h2 id="a-method-paper-on-practical-graph-isomorphism">A Method Paper on Practical Graph Isomorphism</h2>
<p>This is a <strong>Method</strong> paper that brings the published description of nauty (version 2.5) up to date and introduces Traces (version 2.0), a new program for graph isomorphism testing and canonical labeling. The paper provides a unified theoretical framework for the individualization-refinement paradigm that underpins all leading graph isomorphism programs, then details the distinct implementation strategies of nauty and Traces. Extensive benchmarks compare both programs against saucy, Bliss, and conauto across graph families ranging from easy to extremely difficult.</p>
<h2 id="the-graph-isomorphism-problem-in-practice">The Graph Isomorphism Problem in Practice</h2>
<p>An isomorphism between two graphs is a bijection between their vertex sets that preserves adjacency. The graph isomorphism problem (GI) asks whether such a bijection exists. While GI is in NP, it is neither known to be in co-NP nor proven NP-complete. NP-completeness is considered unlikely, as it would imply collapse of the <a href="https://en.wikipedia.org/wiki/Polynomial_hierarchy">polynomial-time hierarchy</a>. The best proven worst-case running time has stood for three decades at $e^{O(\sqrt{n \log n})}$.</p>
<p>In practice, direct isomorphism testing is poorly suited for common tasks like removing duplicates from large graph collections or looking up graphs in databases. The standard approach is <strong>canonical labeling</strong>: relabeling a graph so that isomorphic graphs become identical after relabeling. This allows sorting algorithms and standard data structures to handle isomorph rejection and retrieval.</p>
<p>The dominant practical approach is the <strong>individualization-refinement paradigm</strong>, introduced by Parris and Read (1969) and developed by Corneil and Gotlieb (1970). McKay&rsquo;s nauty (1978, 1980) was the first program to handle both structurally regular graphs with hundreds of vertices and graphs with large <a href="https://en.wikipedia.org/wiki/Automorphism_group">automorphism groups</a>. Its key innovation was using discovered automorphisms to prune the search tree. nauty dominated the field for decades until competitors like saucy (2004), Bliss (2007), and conauto (2009) introduced sparse data structures, early refinement abort, and other improvements.</p>
<h2 id="the-individualization-refinement-framework">The Individualization-Refinement Framework</h2>
<p>The paper provides a general formal framework encompassing all leading graph isomorphism algorithms. The core idea has three components: vertex colorings, a search tree built by individualizing vertices, and pruning via node invariants and automorphisms.</p>
<h3 id="colorings-and-refinement">Colorings and Refinement</h3>
<p>A <strong>colouring</strong> of vertex set $V$ is a surjective function $\pi: V \to {1, 2, \ldots, k}$. A colouring is <strong>equitable</strong> if any two vertices of the same colour are adjacent to the same number of vertices of each colour. Given any colouring $\pi$, there exists a unique coarsest equitable colouring $\pi&rsquo;$ with $\pi&rsquo; \preceq \pi$ (meaning $\pi&rsquo;$ is finer than or equal to $\pi$). Computing this equitable refinement is the primary computational bottleneck.</p>
<p><strong>Individualization</strong> gives a single vertex a unique colour, then refines:</p>
<p>$$
I(\pi, v)(w) = \begin{cases} \pi(w), &amp; \text{if } \pi(w) &lt; \pi(v) \text{ or } w = v \\ \pi(w) + 1, &amp; \text{otherwise} \end{cases}
$$</p>
<p>The refinement function $R(G, \pi_0, \nu)$ applies equitable refinement after each individualization step for a sequence of vertices $\nu = (v_1, v_2, \ldots)$.</p>
<h3 id="search-tree-and-canonical-forms">Search Tree and Canonical Forms</h3>
<p>The search tree $\mathcal{T}(G, \pi_0)$ is a rooted tree whose nodes are vertex sequences. Starting from the empty sequence at the root, each node extends the sequence by choosing a vertex from a <strong>target cell</strong> (a non-singleton cell of the current colouring). Leaves correspond to discrete colourings (permutations of $V$).</p>
<p>A <strong>canonical form</strong> is a function $C: \mathcal{G} \times \Pi \to \mathcal{G} \times \Pi$ satisfying:</p>
<ul>
<li>$C(G, \pi) \cong (G, \pi)$ (the canonical form is isomorphic to the input)</li>
<li>$C(G^g, \pi^g) = C(G, \pi)$ for all $g \in S_n$ (label-invariance)</li>
</ul>
<p>The canonical form is computed by finding the leaf $\nu^*$ maximizing the node invariant $\phi(G, \pi_0, \nu)$, then applying the corresponding discrete colouring.</p>
<h3 id="tree-pruning">Tree Pruning</h3>
<p>Three pruning operations keep the search tractable:</p>
<ul>
<li><strong>$P_A(\nu, \nu&rsquo;)$</strong>: Remove subtree at $\nu&rsquo;$ if $\phi(G, \pi_0, \nu) &gt; \phi(G, \pi_0, \nu&rsquo;)$ (invariant comparison)</li>
<li><strong>$P_B(\nu, \nu&rsquo;)$</strong>: Remove subtree at $\nu&rsquo;$ if $\phi(G, \pi_0, \nu) \neq \phi(G, \pi_0, \nu&rsquo;)$ (inequivalence)</li>
<li><strong>$P_C(\nu, g)$</strong>: Remove subtree at $\nu^g$ if $g \in \text{Aut}(G, \pi_0)$ and $\nu &lt; \nu^g$ (automorphism pruning)</li>
</ul>
<p>Theorem 5 in the paper guarantees that after any sequence of these pruning operations, at least one canonical leaf survives and the discovered automorphisms generate the full automorphism group.</p>
<h2 id="implementation-nauty-vs-traces">Implementation: nauty vs. Traces</h2>
<p>While both programs operate within the same individualization-refinement framework, their implementation strategies differ substantially.</p>
<h3 id="refinement-strategies">Refinement Strategies</h3>
<p>Both nauty and Traces compute equitable colourings using Algorithm 1, which iteratively splits cells based on adjacency counts. For regular graphs (where all vertices have equal degree), the initial colouring is trivially equitable, making these graphs difficult. nauty addresses this with a library of stronger partitioning functions (e.g., triangle counting), which require user expertise to select. Traces instead uses a richer node invariant that often makes stronger refinements unnecessary.</p>
<h3 id="target-cell-selection">Target Cell Selection</h3>
<p>nauty has two strategies: using the first non-singleton cell regardless of size, or choosing the first cell with the most non-trivial joins to other cells (where a non-trivial join means more than 0 edges and less than the maximum possible between two cells). An earlier version of nauty preferred the smallest non-singleton cell, hypothesizing it would more likely correspond to a group orbit, but experiments showed the first non-singleton cell performs better in most cases. Traces prefers <strong>large</strong> target cells, which produce shallower search trees. Specifically, Traces selects the first largest non-singleton cell that is a subset of the parent node&rsquo;s target cell. If no non-singleton cells satisfy this, it falls back to the grandparent node&rsquo;s target cell, and so on.</p>
<h3 id="node-invariants-the-trace">Node Invariants: The Trace</h3>
<p>The most consequential difference is in node invariants. nauty computes a single integer $f(\nu)$ at each node, forming a vector $(f([\nu]_0), f([\nu]_1), \ldots, f(\nu))$ for lexicographic comparison. Traces defines $f(\nu)$ as a <strong>vector</strong> encoding the sizes and positions of cells in the order they are created during refinement. This vector-of-vectors structure (the &ldquo;trace,&rdquo; hence the program&rsquo;s name) enables comparison while refinement is still incomplete. For many difficult graph families, only a fraction of refinement operations need to finish before pruning can occur.</p>
<h3 id="tree-scanning-order">Tree Scanning Order</h3>
<p>This is the fundamental architectural difference. nauty uses <strong>depth-first</strong> search, keeping the lexicographically least leaf $\nu_1$ and the leaf $\nu^*$ with the greatest invariant discovered so far. Pruning applies when a node&rsquo;s invariant matches neither.</p>
<p>Traces uses <strong>breadth-first</strong> search, processing all nodes at each level $k$ and retaining only those with the greatest invariant value. By property $(\phi 1)$, the best nodes at level $k$ are children of the best nodes at level $k-1$, so no backtracking is needed. This maximizes pruning operation $P_A$.</p>
<p>To compensate for the fact that breadth-first search delays automorphism discovery (which requires leaves), Traces generates <strong>experimental paths</strong>: random paths from each node down to a leaf. Random experimental paths tend to find automorphisms generating larger subgroups, making more of the group available early for pruning. Both programs maintain discovered automorphisms using the <a href="https://en.wikipedia.org/wiki/Schreier%E2%80%93Sims_algorithm">random Schreier method</a> for efficient orbit computation.</p>
<h3 id="low-degree-vertex-handling">Low-Degree Vertex Handling</h3>
<p>Traces includes special handling for vertices of degree 0, 1, 2, or $n-1$. After the initial refinement, vertices with equal colours also have equal degrees. The target cell selector never selects cells containing vertices of these low degrees, and nodes whose non-trivial cells consist only of such vertices are not expanded further. Instead, special-purpose code produces generators for the automorphism group fixed by that node and, if needed, a unique discrete colouring. This technique is effective for graphs with many small components and tree-like structures (as in constraint satisfaction problems), though the authors note that such graphs could also benefit from preprocessing that factors out tree-like appendages and replaces vertices with identical neighborhoods.</p>
<h3 id="automorphism-detection">Automorphism Detection</h3>
<p>Beyond leaf comparison, saucy introduced early detection of automorphisms higher in the search tree by checking whether partial mappings between equivalent colourings extend trivially. Traces extends this idea with a heuristic that attempts non-trivial extensions. When computing only the automorphism group (not canonical labeling), Traces employs a strategy where it finds all discrete children of one node and then checks each remaining node for a single matching discrete child, further reducing search effort.</p>
<h2 id="performance-benchmarks">Performance Benchmarks</h2>
<p>The authors compare nauty 2.5, Traces 2.0, saucy 3.0, Bliss 7.2, and conauto 2.0.1 on a MacBook Pro with a 2.66 GHz Intel i7 processor. All graphs were randomly labeled before processing to avoid artifacts from input ordering. The benchmark covers both automorphism group computation and canonical labeling.</p>
<table>
  <thead>
      <tr>
          <th>Graph Family</th>
          <th>Best Program(s)</th>
          <th>Notes</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Random graphs ($p = 1/2$)</td>
          <td>nauty, Traces</td>
          <td>All programs fast; easy class</td>
      </tr>
      <tr>
          <td>Random graphs ($p = n^{-1/2}$)</td>
          <td>nauty</td>
          <td>Sparse random graphs</td>
      </tr>
      <tr>
          <td>Random cubic graphs</td>
          <td>nauty (with invariant)</td>
          <td>nauty benefits from distance invariant</td>
      </tr>
      <tr>
          <td><a href="https://en.wikipedia.org/wiki/Hypercube_graph">Hypercubes</a></td>
          <td>Traces</td>
          <td>Vertex-transitive; Traces dramatically faster</td>
      </tr>
      <tr>
          <td>Misc. vertex-transitive</td>
          <td>Traces</td>
          <td>Large automorphism groups</td>
      </tr>
      <tr>
          <td>Unions of tripartite graphs</td>
          <td>conauto, Bliss</td>
          <td>Special handling for disjoint components</td>
      </tr>
      <tr>
          <td>Small strongly-regular graphs</td>
          <td>Traces, nauty</td>
          <td>Both competitive</td>
      </tr>
      <tr>
          <td>Large strongly-regular graphs</td>
          <td>Traces</td>
          <td>Orders of magnitude faster</td>
      </tr>
      <tr>
          <td>Hadamard matrix graphs</td>
          <td>Traces</td>
          <td>Among the hardest known classes</td>
      </tr>
      <tr>
          <td>Random trees</td>
          <td>nauty</td>
          <td>Low-degree preprocessing helps</td>
      </tr>
      <tr>
          <td>Cai-Furer-Immerman graphs</td>
          <td>Traces</td>
          <td>Designed to defeat refinement; Traces still efficient</td>
      </tr>
      <tr>
          <td>Miyazaki graphs</td>
          <td>Traces</td>
          <td>Another hard class; dramatic advantage</td>
      </tr>
      <tr>
          <td><a href="https://en.wikipedia.org/wiki/Projective_plane">Projective planes</a> (order 16)</td>
          <td>Traces</td>
          <td>Large automorphism groups on bipartite graphs</td>
      </tr>
      <tr>
          <td>Combinatorial graphs</td>
          <td>Mixed</td>
          <td>Performance varies by instance; Traces generally competitive</td>
      </tr>
  </tbody>
</table>
<p>The results show that nauty is generally fastest for small graphs and some easier families, while Traces dominates on most difficult graph classes, sometimes by orders of magnitude. The breadth-first tree scanning strategy of Traces, combined with its richer node invariant, provides the largest gains on graphs with complex symmetry structure (<a href="https://en.wikipedia.org/wiki/Strongly_regular_graph">strongly-regular graphs</a>, <a href="https://en.wikipedia.org/wiki/Hadamard_matrix">Hadamard matrix</a> graphs, <a href="https://en.wikipedia.org/wiki/Vertex-transitive_graph">vertex-transitive graphs</a>). The exception is graph families with many disjoint or minimally-overlapping components, where conauto and Bliss have specialized handling that nauty and Traces lack.</p>
<h2 id="key-findings-and-limitations">Key Findings and Limitations</h2>
<p>The paper establishes several findings:</p>
<ol>
<li>The breadth-first tree scanning approach in Traces, combined with experimental paths for early automorphism discovery, provides large efficiency gains on difficult graph classes.</li>
<li>Traces&rsquo; richer node invariant (the trace) enables early pruning during incomplete refinement, reducing dependence on user-selected invariant functions compared to nauty.</li>
<li>No single program dominates all graph classes. nauty remains preferred for mass processing of small graphs.</li>
<li>The random Schreier method for maintaining the automorphism group is effective in both programs, enabling more complete pruning via orbit computation.</li>
</ol>
<p>Limitations acknowledged by the authors include: nauty and Traces lack specialized code for graphs consisting of disjoint or minimally-overlapping components (where conauto and Bliss excel), and the choice of refinement function in nauty still requires user expertise for certain difficult graph classes.</p>
<hr>
<h2 id="reproducibility-details">Reproducibility Details</h2>
<h3 id="data">Data</h3>
<table>
  <thead>
      <tr>
          <th>Purpose</th>
          <th>Dataset</th>
          <th>Size</th>
          <th>Notes</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Benchmarking</td>
          <td>Bliss test collection</td>
          <td>Multiple families</td>
          <td>Graphs ranging from easy to very difficult</td>
      </tr>
      <tr>
          <td>Benchmarking</td>
          <td>nauty/Traces website collection</td>
          <td>Multiple families</td>
          <td>All test graphs available at the project website</td>
      </tr>
  </tbody>
</table>
<p>All test graphs are publicly available at the nauty and Traces website. Graphs were randomly labeled before processing to avoid non-typical behavior from input labeling.</p>
<h3 id="algorithms">Algorithms</h3>
<p>The core algorithms are described formally with proofs of correctness (Theorem 5 guarantees pruning validity). Key implementation choices:</p>
<ul>
<li><strong>Refinement</strong>: Equitable colouring via Algorithm 1 (iterated cell splitting by adjacency counts)</li>
<li><strong>Target cell selection</strong>: nauty uses first non-singleton or most non-trivially joined cell; Traces uses first largest cell within parent&rsquo;s target</li>
<li><strong>Tree scanning</strong>: nauty uses depth-first; Traces uses breadth-first with experimental paths</li>
<li><strong>Group maintenance</strong>: Random Schreier method for orbit computation in both programs</li>
</ul>
<h3 id="software">Software</h3>
<table>
  <thead>
      <tr>
          <th>Program</th>
          <th>Version</th>
          <th>Canonical Labeling</th>
          <th>Open Source</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>nauty</td>
          <td>2.5</td>
          <td>Yes</td>
          <td>Yes</td>
      </tr>
      <tr>
          <td>Traces</td>
          <td>2.0</td>
          <td>Yes</td>
          <td>Yes</td>
      </tr>
      <tr>
          <td>saucy</td>
          <td>3.0</td>
          <td>No (v3.0)</td>
          <td>Yes</td>
      </tr>
      <tr>
          <td>Bliss</td>
          <td>7.2</td>
          <td>Yes</td>
          <td>Yes</td>
      </tr>
      <tr>
          <td>conauto</td>
          <td>2.0.1</td>
          <td>No</td>
          <td>Yes</td>
      </tr>
  </tbody>
</table>
<h3 id="artifacts">Artifacts</h3>
<table>
  <thead>
      <tr>
          <th>Artifact</th>
          <th>Type</th>
          <th>License</th>
          <th>Notes</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><a href="http://pallini.di.uniroma1.it/">nauty and Traces</a></td>
          <td>Code</td>
          <td>Apache 2.0</td>
          <td>Official distribution (v2.9.3 as of 2026); includes gtools graph utilities</td>
      </tr>
      <tr>
          <td><a href="http://pallini.di.uniroma1.it/">Test graphs</a></td>
          <td>Dataset</td>
          <td>Apache 2.0</td>
          <td>All benchmark graphs from the paper, available at the project website</td>
      </tr>
  </tbody>
</table>
<h3 id="hardware">Hardware</h3>
<p>Benchmarks run on a MacBook Pro with 2.66 GHz Intel i7 processor, compiled with gcc 4.7, single-threaded execution.</p>
<hr>
<h2 id="paper-information">Paper Information</h2>
<p><strong>Citation</strong>: McKay, B. D., &amp; Piperno, A. (2013). Practical graph isomorphism, II. <em>Journal of Symbolic Computation</em>, 60, 94-112.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bibtex" data-lang="bibtex"><span style="display:flex;"><span><span style="color:#a6e22e">@article</span>{mckay2013practical,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">title</span>=<span style="color:#e6db74">{Practical graph isomorphism, {II}}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">author</span>=<span style="color:#e6db74">{McKay, Brendan D. and Piperno, Adolfo}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">journal</span>=<span style="color:#e6db74">{Journal of Symbolic Computation}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">volume</span>=<span style="color:#e6db74">{60}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">pages</span>=<span style="color:#e6db74">{94--112}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">year</span>=<span style="color:#e6db74">{2013}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">publisher</span>=<span style="color:#e6db74">{Elsevier BV}</span>,
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">doi</span>=<span style="color:#e6db74">{10.1016/j.jsc.2013.09.003}</span>
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div>]]></content:encoded></item></channel></rss>