Computational Chemistry
Ball model representation of a crystal surface with steps, kinks, adatoms, and vacancies showing various surface features

Platinum Adatom Diffusion on Pt(100) Surface

LAMMPS tutorial for platinum adatom diffusion simulation and ML training data. Learn how heavier atoms behave …

Computational Chemistry
Molecular visualization of a methionine dipeptide structure from MD simulation

Generating Mini-Protein Trajectories with GROMACS

Systematic GROMACS workflows for simulating mini-proteins across multiple amino acids to generate diverse MD …

Computational Chemistry
Molecular visualization of a methionine dipeptide structure from MD simulation

Mini-Protein Trajectory Generation

Automated GROMACS pipeline generating high-fidelity MD trajectories with atomic force extraction for Neural Network …...

Computational Social Science
Data visualization showing congressional bill analysis and legislative patterns

Congressional Data Analysis & Classification

Data science project scraping 47,000+ congressional bills, analyzing legislative patterns, and building ML models …...

Computational Social Science
Diagram of the Universal Message schema showing fields like ID, Text, Author, and Reply Sets that normalize data across platforms

Look, Don't Tweet: Unified Data Models for Social NLP

A comprehensive study on cross-platform social media analysis, introducing the PyConversations library and a unified …...

Computational Social Science
NewsTweet data collection pipeline: news outlets are crawled via Google News RSS feeds, articles are accessed to extract embedded tweets, and user timelines are downloaded from Twitter

NewsTweet Dataset: Social Media in Digital Journalism

NewsTweet dataset and pipeline for studying embedded tweets in online news via Google News, chosen for its significant …...

Natural Language Processing
Venn diagram showing semantic overlap between word senses for go, move, and proceed, illustrating our hierarchy induction algorithm

Data-Driven WordNet Construction from Wiktionary

We introduce an unsupervised algorithm for inducing semantic networks from noisy, crowd-sourced data, producing a …...

Natural Language Processing
Types and distribution of coreferences in QuAC dataset showing dialogue complexity

QuAC: Question Answering in Context Dataset

Analysis of QuAC's conversational QA through student-teacher interactions, featuring 100K+ context-dependent questions …