Abstract
Undergraduate thesis exploring representation learning for social media text and developing tools for cross-platform conversational analysis. Built PyConversations, a Python module for analyzing social media conversations, and found that large pre-trained models don’t always outperform domain-specific approaches.
What I Built
PyConversations Module
- Multi-platform support: Unified interface for analyzing data across social media platforms
- Conversation analysis: Tools for understanding dialogue patterns, sentiment, and interaction dynamics
- Modular design: Extensible structure for adding new platforms and analysis methods
- Data processing: Efficient handling of large social media datasets
Research Contributions
- Representation learning study: Compared domain-specific vs. general-purpose models on social media text
- Model performance analysis: Found specialized approaches sometimes outperform large pre-trained models
- Cross-platform analysis: Methods for understanding communication patterns across different platforms
Key Findings
- Model performance: Standard pre-trained models didn’t always outperform smaller, domain-specific approaches
- Context importance: Conversational context and dialogue structure proved crucial for understanding social media interactions
- Domain adaptation: Social media text benefits from specialized handling rather than generic approaches
- Cross-platform challenges: Different platforms require adapted approaches despite seeming similarities
Team & Recognition
- Hunter Heidenreich - Lead Researcher and Developer
- Jake Williams - Faculty Advisor
- π First Place - Research Undergraduate Senior Thesis at Drexel University
Impact
This undergraduate work contributed to understanding domain-specific approaches in NLP and provided tools for social media research. The findings about model performance suggested that larger models aren’t automatically better for specialized domains - a perspective that has become more relevant as the field continues to evolve.