Machine-Learning

SELFIES and the Future of Molecular String Representations

This 2022 perspective paper reviews 250 years of chemical notation evolution and proposes 16 concrete research projects to extend SELFIES beyond traditional organic chemistry into polymers, crystals, and reactions.

Computational Social Science

Diagram of the Universal Message schema showing fields like ID, Text, Author, and Reply Sets that normalize data across platforms

PyConversations: Social Media Conversational Analysis

Research project that investigated how different NLP models perform on social media data, finding that domain-specific approaches often outperform large pre-trained models. Includes PyConversations, a Python module for analyzing conversations across social media platforms.

Machine Learning Fundamentals

Vintage slot machine with multiple arms representing the multi-arm bandit problem in machine learning

5 Axes of Multi-Arm Bandit Problems: A Practical Guide

Key dimensions that have helped me understand multi-arm bandit problems: action space, problem structure, external information, reward mechanism, and learner feedback.

Machine Learning Fundamentals

NEAT genome encoding diagram showing node genes and connection genes with innovation numbers

A Guide to Neuroevolution: NEAT and HyperNEAT

Discover how NEAT and HyperNEAT changed neuroevolution by automatically designing neural network architectures and scaling them through geometric patterns.

Machine Learning Fundamentals

Diagram showing the three main types of machine learning: supervised, unsupervised, and reinforcement learning

Breaking Down Machine Learning for the Average Person

Understand the pattern recognition behind Netflix recommendations, email spam filters, and game-playing AI through three core machine learning approaches.

Machine Learning Fundamentals

Diagram illustrating knowledge-based agent architecture with knowledge base, reasoning, and action components

Foundations of AI: Knowledge-Based Agents and Logic

Explore the building blocks of classic AI reasoning, from knowledge bases and logic to how systems draw new conclusions from existing knowledge.

Natural Language Processing

Types and distribution of coreferences in QuAC dataset showing dialogue complexity

QuAC: Question Answering in Context Dataset

QuAC introduces a conversational QA dataset that models student-teacher interactions, creating context-dependent questions that test systems’ ability to understand dialogue and resolve references.

Natural Language Processing

Visualization of coreference resolution in the CoQA conversational question answering dataset

CoQA Dataset: Advancing Conversational Question Answering

CoQA extends question answering beyond isolated questions to conversations that require context and reference understanding.

Generative Modeling

Illustration of GAN training process showing adversarial competition between generator and discriminator

Understanding GANs: From Fundamentals to Objective Functions

An in-depth guide to GANs: how two neural networks compete to generate realistic data, the math behind it, and the evolution of objective functions that stabilize training.

Natural Language Processing

3D visualization of word embeddings showing semantic relationships in vector space

Word Embeddings in NLP: An Introduction

Learn how computers understand words through mathematical vectors, from simple counting methods to contextual embeddings that power modern NLP.