Computational Chemistry
Ball model representation of a crystal surface with steps, kinks, adatoms, and vacancies showing various surface features

Platinum Adatom Diffusion on Pt(100) Surface

LAMMPS tutorial for platinum adatom diffusion simulation and ML training data. Learn how heavier atoms behave …

Computational Biology
Molecular visualization of a methionine dipeptide structure from MD simulation

Generating Mini-Protein Trajectories with GROMACS

Systematic GROMACS workflows for simulating mini-proteins across multiple amino acids to generate diverse MD …

Computational Biology
Molecular visualization of a methionine dipeptide structure from MD simulation

Mini-Protein Trajectory Generation

Automated GROMACS pipeline generating high-fidelity MD trajectories with atomic force extraction for Neural Network …

Computational Social Science
Top features for Social Welfare policy classification showing social, poverty, benefits keywords

Congressional Knowledge Graph & Policy Classification

A 47,000+ bill knowledge graph from Congress.gov with sponsor networks and 87% policy classification accuracy.

Computational Social Science
Diagram of the Universal Message schema showing fields like ID, Text, Author, and Reply Sets that normalize data across platforms

Look, Don't Tweet: Unified Data Models for Social NLP

PyConversations library and unified data schema for normalizing 300M+ posts across Twitter, Reddit, Facebook, and 4chan.

Computational Social Science
NewsTweet data collection pipeline: news outlets are crawled via Google News RSS feeds, articles are accessed to extract embedded tweets, and user timelines are downloaded from Twitter

NewsTweet Dataset: Social Media in Digital Journalism

NewsTweet dataset for studying embedded tweets in online journalism. Analysis shows 13% of Google News stories contain …

Natural Language Processing
Venn diagram showing semantic overlap between word senses for go, move, and proceed, illustrating our hierarchy induction algorithm

Data-Driven WordNet Construction from Wiktionary

We introduce an unsupervised algorithm for inducing semantic networks from noisy, crowd-sourced data, producing a …

Natural Language Processing
Types and distribution of coreferences in QuAC dataset showing dialogue complexity

QuAC: Question Answering in Context Dataset

Analysis of QuAC's conversational QA through student-teacher interactions, featuring 100K+ context-dependent questions …

Natural Language Processing
Visualization of coreference resolution in the CoQA conversational question answering dataset

CoQA Dataset: Advancing Conversational Question Answering

Analysis of CoQA, a conversational QA dataset with multi-turn dialogue, coreference resolution, and natural answers for …