Abstract

This paper explores data-driven construction of wordnets using Wiktionary as a source. We demonstrate that semantic networks can be effectively induced from noisy, user-annotated lexical resources, providing a scalable approach to building linguistic knowledge bases.

Key Contributions

  • Data-driven wordnet construction: Novel approach using Wiktionary as source material
  • Noise handling: Methods for dealing with inconsistent user-generated content
  • Semantic network induction: Techniques for extracting meaningful semantic relationships
  • Evaluation framework: Metrics for assessing quality of induced semantic networks

Technical Approach

Our method leverages the linked structure of Wiktionary entries to identify semantic relationships between words. We developed techniques to handle the inherent noise in user-generated content while preserving valuable semantic information.

Impact

This work contributed to research on automatic construction of lexical resources and demonstrated the potential of user-generated content for NLP applications, despite its noisy nature.

Citation

@inproceedings{heidenreich2019latent,
  title={Latent semantic network induction in the context of linked example senses},
  author={Heidenreich, Hunter and Williams, Jake},
  booktitle={Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)},
  pages={170--180},
  year={2019}
}