This event has passed.

Paradoxes in Transformer Language Models: Masking, Positional Encodings, and Routing

Name: Paradoxes in Transformer Language Models: Masking, Positional Encodings, and Routing
Start: 2024-04-10T14:00:00-04:00
End: 2024-04-10T15:00:00-04:00
Location: Brooks Building 141

April 10 @ 2:00 pm - 3:00 pm

Join Siva Reddy for Paradoxes in Transformer Language Models: Masking, Positional Encodings, and Routing Abstract: The defining features of Transformer Language Models, such as causal masking, positional encodings, and their monolithic architecture (i.e., the absence of a specific routing mechanism), are paradoxically the same features that hinder their generalization capabilities, and removing them makes them better at generalization. I will present evidence of these paradoxes on various generalizations, including length generalization, instruction following, and multi-task learning.

Bio: Siva Reddy is an Assistant Professor in the School of Computer Science and Linguistics at McGill University. He is also a Facebook CIFAR AI Chair, a core faculty member of Mila Quebec AI Institute and a research scientist at ServiceNow Research. His research focuses on representation learning for language that facilitates reasoning, conversational modeling and safety. He received the 2020 VentureBeat AI Innovation Award in NLP, and the best paper award at EMNLP 2021. Before McGill, he was a postdoctoral researcher at Stanford University and a Google PhD fellow at the University of Edinburgh.

Details

Date:: April 10
Time:: 2:00 pm - 3:00 pm

Venue

: Brooks Building 141

Comments are closed.

Paradoxes in Transformer Language Models: Masking, Positional Encodings, and Routing

April 10 @ 2:00 pm - 3:00 pm

Details

Venue

Connect with CS