Abstract

Undergraduate thesis exploring representation learning for social media text and developing tools for cross-platform conversational analysis. Built PyConversations, a Python module for analyzing social media conversations, and found that large pre-trained models don’t always outperform domain-specific approaches.

What I Built

PyConversations Module

  • Multi-platform support: Unified interface for analyzing data across social media platforms
  • Conversation analysis: Tools for understanding dialogue patterns, sentiment, and interaction dynamics
  • Modular design: Extensible structure for adding new platforms and analysis methods
  • Data processing: Efficient handling of large social media datasets

Research Contributions

  • Representation learning study: Compared domain-specific vs. general-purpose models on social media text
  • Model performance analysis: Found specialized approaches sometimes outperform large pre-trained models
  • Cross-platform analysis: Methods for understanding communication patterns across different platforms

Key Findings

  • Model performance: Standard pre-trained models didn’t always outperform smaller, domain-specific approaches
  • Context importance: Conversational context and dialogue structure proved crucial for understanding social media interactions
  • Domain adaptation: Social media text benefits from specialized handling rather than generic approaches
  • Cross-platform challenges: Different platforms require adapted approaches despite seeming similarities

Team & Recognition

  • Hunter Heidenreich - Lead Researcher and Developer
  • Jake Williams - Faculty Advisor
  • πŸ† First Place - Research Undergraduate Senior Thesis at Drexel University

Impact

This undergraduate work contributed to understanding domain-specific approaches in NLP and provided tools for social media research. The findings about model performance suggested that larger models aren’t automatically better for specialized domains - a perspective that has become more relevant as the field continues to evolve.