Natural Language Processing
Word vector illustration showing text classification and NLP concepts

Sarcasm Detection with Transformers: A Cautionary Tale

Learn how dataset bias can lead to misleading results in NLP: a sarcasm detection model that actually learned to …

Natural Language Processing

Analytical Model of Word2Vec and GloVe Statistics

Analytical model of Word2Vec and GloVe statistics. First analytical solution to Word2Vec's softmax skip-gram with bias …...

Natural Language Processing

EigenNoise: Data-Free Word Vector Initialization

Investigation into EigenNoise, a data-free initialization scheme for word vectors that approaches pre-trained model …...

Natural Language Processing

Data-Driven WordNet Construction from Wiktionary

Explores a data-driven approach to construct a WordNet-like semantic network using the entirety of the noisy, …...

Natural Language Processing
Types and distribution of coreferences in QuAC dataset showing dialogue complexity

QuAC: Question Answering in Context Dataset

Analysis of QuAC's conversational QA through student-teacher interactions, featuring 100K+ context-dependent questions …

Natural Language Processing

CoQA Dataset: Advancing Conversational Question Answering

Analysis of CoQA, a conversational QA dataset with multi-turn dialogue, coreference resolution, and natural answers for …

Natural Language Processing
One-hot encoding and count vectorization visualization showing sparse vector representation

Count Vectorization with scikit-learn in Python

Learn count vectorization in Python: convert text to numerical vectors using scikit-learn's CountVectorizer with …

Natural Language Processing
3D visualization of word embeddings showing semantic relationships in vector space

Word Embeddings in NLP: An Introduction

Learn about word embeddings in NLP: from basic one-hot encoding to contextual models like ELMo. Guide with examples.