Skip to main content
Loading Events

« All Events

  • This event has passed.

Paradoxes in Transformer Language Models: Masking, Positional Encodings, and Routing

April 10 @ 2:00 pm - 3:00 pm

Join Siva Reddy for Paradoxes in Transformer Language Models: Masking, Positional Encodings, and Routing   Abstract: The defining features of Transformer Language Models, such as causal masking, positional encodings, and their monolithic architecture (i.e., the absence of a specific routing mechanism), are paradoxically the same features that hinder their generalization capabilities, and removing them makes them better at generalization. I will present evidence of these paradoxes on various generalizations, including length generalization, instruction following, and multi-task learning.
Bio: Siva Reddy is an Assistant Professor in the School of Computer Science and Linguistics at McGill University. He is also a Facebook CIFAR AI Chair, a core faculty member of Mila Quebec AI Institute and a research scientist at ServiceNow Research. His research focuses on representation learning for language that facilitates reasoning, conversational modeling and safety. He received the 2020 VentureBeat AI Innovation Award in NLP, and the best paper award at EMNLP 2021. Before McGill, he was a postdoctoral researcher at Stanford University and a Google PhD fellow at the University of Edinburgh.


April 10
2:00 pm - 3:00 pm
Comments are closed.