Hunter Heidenreich | ML Research Scientist — Page 33

Natural Language Processing
Information Quality Ratio plot showing statistical dependencies decay as window size increases

Analytical Solution to Word2Vec Softmax & Bias Probing

We provide the first analytical solution to Word2Vec’s softmax skip-gram objective, introducing the Independent Frequencies Model and deriving a low-cost, training-free method for measuring semantic bias directly from corpus statistics.

Natural Language Processing
Heatmap visualization of the EigenNoise analytical co-occurrence prior matrix showing word rank relationships

EigenNoise: Data-Free Word Vector Initialization

We develop EigenNoise, a zero-data initialization method for word vectors that synthesizes representations from Zipf’s Law alone, demonstrating competitive performance to GloVe after fine-tuning without requiring any pre-training corpus.

Computational Social Science
Diagram of the Universal Message schema showing fields like ID, Text, Author, and Reply Sets that normalize data across platforms

Look, Don't Tweet: Unified Data Models for Social NLP

Bachelor’s thesis introducing PyConversations, an open-source library that normalizes over 308 million posts from Twitter, Reddit, Facebook, and 4chan into a unified data model for cross-platform social media research.

Computational Social Science
Diagram of the Universal Message schema showing fields like ID, Text, Author, and Reply Sets that normalize data across platforms

PyConversations: Social Media Conversational Analysis

Research project that investigated how different NLP models perform on social media data, finding that domain-specific approaches often outperform large pre-trained models. Includes PyConversations, a Python module for analyzing conversations across social media platforms.

Natural Language Processing
A nonsensical trigger sequence 'WTC theoriesclimate Flat Hubbard Principle' is fed into GPT-2, which then generates Flat Earth conspiracy text

GPT-2 Susceptibility to Universal Adversarial Triggers

We demonstrate that universal adversarial triggers can control both the topic and stance of GPT-2’s generated text, revealing security vulnerabilities in deployed language models and proposing constructive applications for bias auditing.

Machine Learning
Vintage slot machine with multiple arms representing the multi-arm bandit problem in machine learning

5 Axes of Multi-Arm Bandit Problems: A Practical Guide

Key dimensions that have helped me understand multi-arm bandit problems: action space, problem structure, external information, reward mechanism, and learner feedback.

Computational Social Science
NewsTweet data collection pipeline: news outlets are crawled via Google News RSS feeds, articles are accessed to extract embedded tweets, and user timelines are downloaded from Twitter

NewsTweet Dataset: Social Media in Digital Journalism

We introduce NewsTweet, a dataset and pipeline for studying embedded tweets in digital journalism, revealing that 13% of Google News articles incorporate tweets and providing insights into how social media becomes newsworthy.

Computational Social Science
Sawtooth follower growth patterns for @elonmusk and @realDonaldTrump showing coordinated bot activity

Coordinated Social Targeting on Twitter

We developed high-frequency monitoring tools to detect coordinated manipulation on Twitter, documenting anomalous follower patterns including sub-second spikes, sawtooth waves, circulating accounts, and weaponized ancient dormant accounts targeting political figures.

Natural Language Processing
Venn diagram showing semantic overlap between word senses for go, move, and proceed, illustrating our hierarchy induction algorithm

Data-Driven WordNet Construction from Wiktionary

We present an unsupervised algorithm for inducing semantic networks from Wiktionary’s crowd-sourced data, creating a WordNet-like resource an order of magnitude larger than Princeton WordNet with over 344,000 linked example sentences.

Machine Learning
NEAT genome encoding diagram showing node genes and connection genes with innovation numbers

A Guide to Neuroevolution: NEAT and HyperNEAT

Discover how NEAT and HyperNEAT changed neuroevolution by automatically designing neural network architectures and scaling them through geometric patterns.

Machine Learning
Diagram showing the three main types of machine learning: supervised, unsupervised, and reinforcement learning

Breaking Down Machine Learning for the Average Person

Understand the pattern recognition behind Netflix recommendations, email spam filters, and game-playing AI through three core machine learning approaches.

Machine Learning
Diagram illustrating knowledge-based agent architecture with knowledge base, reasoning, and action components

Foundations of AI: Knowledge-Based Agents and Logic

Explore the building blocks of classic AI reasoning, from knowledge bases and logic to how systems draw new conclusions from existing knowledge.