Research notes on natural language processing and language models.
- Language Models – Architectures, pretraining strategies, data scaling, and training optimization
Research notes on natural language processing and language models.
Notes on language model architectures (T5, RWKV, Block-Recurrent Transformers), pretraining data strategies, and scaling laws for LLM training.