
Optimizing Sequence Models for Dynamical Systems
Ablation study deconstructing sequence models. Attention-augmented Recurrent Highway Networks outperform Transformers on …

Ablation study deconstructing sequence models. Attention-augmented Recurrent Highway Networks outperform Transformers on …