Document Processing

LLMs for Page Stream Segmentation

Enhanced TABME benchmark for page stream segmentation, creating TABME++, showing fine-tuned decoder-based LLMs …...

Document Processing

LLMs for Insurance Document Automation

LLM applications for insurance document automation using parameter-efficient fine-tuning and analysis of calibration …...

Computational Social Science
Data visualization showing congressional bill analysis and legislative patterns

Congressional Data Analysis & Classification

Data science project scraping 47,000+ congressional bills, analyzing legislative patterns, and building ML models …...

Natural Language Processing

Analytical Model of Word2Vec and GloVe Statistics

Analytical model of Word2Vec and GloVe statistics. First analytical solution to Word2Vec's softmax skip-gram with bias …...

Natural Language Processing

EigenNoise: Data-Free Word Vector Initialization

Investigation into EigenNoise, a data-free initialization scheme for word vectors that approaches pre-trained model …...

Computational Social Science
Network visualization showing social media conversational analysis and text relationships

PyConversations: Social Media Conversational Analysis

Undergraduate thesis exploring representation learning for social media text and developing tools for cross-platform …

AI Safety

GPT-2 Susceptibility to Universal Adversarial Triggers

Investigation into whether universal adversarial triggers can control both topic and stance of GPT-2's generated text …...

Natural Language Processing

Data-Driven WordNet Construction from Wiktionary

Explores a data-driven approach to construct a WordNet-like semantic network using the entirety of the noisy, …...

AI Fundamentals

An Introduction to Knowledge-Based Agents

Learn about knowledge-based agents: how AI systems use knowledge bases, reasoning, and inference to build intelligent …

Natural Language Processing
Types and distribution of coreferences in QuAC dataset showing dialogue complexity

QuAC: Question Answering in Context Dataset

Analysis of QuAC's conversational QA through student-teacher interactions, featuring 100K+ context-dependent questions …

Natural Language Processing

CoQA Dataset: Advancing Conversational Question Answering

Analysis of CoQA, a conversational QA dataset with multi-turn dialogue, coreference resolution, and natural answers for …

Natural Language Processing
3D visualization of word embeddings showing semantic relationships in vector space

Word Embeddings in NLP: An Introduction

Learn about word embeddings in NLP: from basic one-hot encoding to contextual models like ELMo. Guide with examples.