Skip to main content
UNC-NLP faculty members Mohit Bansal, Snigdha Chaturvedi, Colin Raffel, and Shashank Srivastava
July 30, 2023

The UNC Natural Language Processing and Machine Learning Group (UNC-NLP) presented research at the annual meeting of the Association for Computational Linguistics (ACL) advancing the state of the art in natural language processing. Held July 9-14, ACL is one of the world’s top conferences in natural language processing.

The UNC-NLP group consists of the combined research groups of four primary faculty members: Professor Mohit Bansal, Assistant Professor Snigdha Chaturvedi, Assistant Professor Colin Raffel, and Assistant Professor Shashank Srivastava. All four groups had papers accepted by ACL 2023, either to the main conference or to Findings of ACL.

Bansal, a John R. and Louise S. Parker professor of computer science and director of the Multimodal Understanding, Reasoning, and Generation for Language Lab (stylized as MURGe-Lab), was especially busy at the conference, as he and his students presented six papers on a wide range of topics, including faithful extractive summarization of text, question answering for online meetings, single-frame bias in video-and-language models, mixed forward and reverse cross-entropy for language models, multimodal graph script induction, and continual learning for code generation. Bansal also serves on the executive committee for the conference.

Peter Hase
Doctoral student Peter Hase was recognized with an Outstanding Area Chair Award

Additionally, doctoral student Peter Hase was recognized with an Outstanding Area Chair Award. The award recognizes the top 1.5 percent or fewer of publication reviewers and area chairs, highlighting those who provided extra helpful reviews and who were particularly active in the discussion phase and demonstrated exceptional open-mindedness or expertise. Hase was one of only 13 recognized with the award, and one of only 6 in the United States.

Main Conference Accepted Papers

The publications accepted by the conference featured collaborations with industry leaders like Amazon, Adobe, and Bloomberg, as well as with other top research universities. Below are summaries of each paper presented by the UNC-NLP group at ACL 2023.

ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning
Shachar Don-Yehiya, Elad Venezian, Colin Raffel, Noam Slonim and Leshem Choshen

Massively multitask learning has traditionally been available only to well-resourced research teams due to the computing resources and simultaneous access required. Working with researchers from IBM, Raffel developed CoID Fusion, a method to achieve the benefits of multitask learning with distributed computation that requires limited communication and no sharing of data. Raffel’s work is making machine learning more accessible, while also enabling the improvement of pretraining models over time.

Crosslingual Generalization through Multitask Finetuning
Niklas Muennighoff, Thomas Wang, Lintang Sutawika, Adam Roberts, Stella Biderman, Teven Le Scao, M Saiful Bari, Sheng Shen, Zheng Xin Yong, Hailey Schoelkopf, Xiangru Tang, Dragomir Radev, Alham Fikri Aji, Khalid Almubarak, Samuel Albanie, Zaid Alyafeai, Albert Webson, Edward Raff and Colin Raffel

Zero-shot learning takes learning approaches trained on a specific task and applies them to a different task, typically using additional related information. As an example, imagine someone who has never heard of a pony being told to look for a horse, but smaller. Multitask prompted fine-tuning (MTF) is a machine learning approach that has been shown to help large language models that are trained on one type of task succeed in performing a different type of task in a zero-shot setting. Raffel and a team of researchers from companies and universities around the world applied MTF principles traditionally used in English language learning to multilingual model families to develop new fine-tuned variants. They found that using their MTF methods with English prompts improves performance on both English and non-English tasks and that fine-tuning on multilingual tasks with prompts that have been machine-translated from English into the model language improves performance on human-written prompts in those languages. And surprisingly, they found that models were able to generalize to tasks in languages they had never seen in a zero-shot setting.

Exploring Continual Learning for Code Generation Models
Prateek Yadav, Qing Sun, Hantian Ding, Xiaopeng Li, Dejiao Zhang, Ming Tan, Parminder Bhatia, Xiaofei Ma, Ramesh Nallapati, Murali Krishna Ramanathan, Mohit Bansal and Bing Xiang

Advancements in AI have brought the ability to automatically generate executable programming code. Unfortunately, programming language libraries are updated frequently, causing existing methods and strategies to be deprecated in favor of new ones. Retraining a code generation model is computationally expensive, so it is important that these models are able to continually learn. Working alongside researchers from Amazon, Bansal and his team compared various continual learning methods from natural language processing and computer vision, ultimately developing a method they call Prompt Pooling with Teacher Forcing, which augments and stabilizes the existing method of Prompt Pooling to perform better on training for code generation.

Extractive is not Faithful: An Investigation of Broad Unfaithfulness Problems in Extractive Summarization
Shiyue Zhang, David Wan and Mohit Bansal

A common application of natural language processing is the summarization of large bodies of text. Extractive summarization selects individual sentences from the original text and combines them to form a shorter summary, while abstractive summarization generates new text that keeps the overall essence of the original. While extractive summarization models are generally regarded as less likely to be faithful to the original text, UNC researchers assessed 16 different extractive systems on 1,600 English summaries using human readers and found that 30 percent of the summaries suffered from common mistakes of faithfulness. The team proposed a new metric, ExtEval, which is designed to detect unfaithful extractive summaries. The metric improves on existing performance and will help future summarization models be more faithful to the original text.

MeetingQA: Extractive Question-Answering on Meeting Transcripts
Archiki Prasad, Trung Bui, Seunghyun Yoon, Hanieh Deilamsalehy, Franck Dernoncourt and Mohit Bansal

If you have joined an online meeting or webinar in recent years, there is a good chance that you have seen an automatically generated transcript of the discussion. Most of the research into generating meeting transcripts focuses on summarization and extraction of action items, but what if that functionality could be adapted into an interactive interface that allows users to ask questions about what was said in the meeting? Working with Adobe Research, UNC researchers developed MeetingQA, a dataset of questions and answers that can be used to test extractive summarization techniques, guiding future research into this challenge.

MixCE: Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies
Shiyue Zhang, Shijie Wu, Ozan Irsoy, Steven Lu, Mohit Bansal, Mark Dredze and David Rosenberg

In natural language processing, autoregressive language models are used in predictive text tasks, where a computer reads text that has already been written and attempts to predict what will be written next. We see this in text message and email applications. These models tend to work well when given more text than they are asked to predict, but they can struggle when the model is asked to predict a large body of text from a smaller body, often predicting nonsensical gibberish or getting caught in repetitive loops. This is partly because the common method of forward cross-entropy doesn’t fully eliminate possible words or phrases that are extremely unlikely to follow, allowing predictive suggestions that amount to nonsense when read by a human. Bansal and his team, along with others from Bloomberg and Johns Hopkins University, presented a process called MixCE, that mixes both forward and reverse cross-entropies to help the model narrow or broaden the field of potential predictions to be more in line with the text a human would generate.

Non-Sequential Graph Script Induction via Multimedia Grounding
Yu Zhou, Sha Li, Manling Li, Xudong Lin, Shih-Fu Chang, Mohit Bansal and Heng Ji

Websites like WikiHow provide a collection of instructions for all sorts of everyday tasks, sometimes accompanied by videos demonstrating the steps outlined. These instructions are always presented linearly, but in practice, users following them may wish to view steps in reverse order to undo something or to skip steps that are only conditionally necessary. In a collaboration between the University of California, Los Angeles; the University of Illinois Urbana-Champaign; and Columbia University, UNC researchers went to work on the challenge of non-sequential graph script induction, which is the capture of optional and interchangeable steps in procedural planning. To automate this process, the team took advantage of videos depicting people performing the tasks, but not in the exact manner directed by the accompanying instructions. Their model is fed both the linear instruction script and the non-linear video and generates a new non-linear script. Given a partial sequence of steps, the model is much better at predicting which step the human is likely to follow next.

Revealing Single Frame Bias for Video-and-Language Learning
Jie Lei, Tamara Berg and Mohit Bansal

Video-and-language models are a key component of multimodal natural language processing, able to combine both video and text to retrieve video segments in response to a text query or answer questions about the video’s content. Intuitively, it would seem that more frames provided as input would result in better performance, even factoring in the increased computation and memory costs. UNC researchers challenged that assumption, determining that with sufficient pre-training and strategy, a model trained on a single frame can perform better than existing methods that use multiple frames. This result revealed that current benchmarks are biased toward static objects and scenes, with less emphasis on how they change over time across multiple frames. The team responded by proposing new tasks designed to better test a model’s temporal modeling ability.


The group also had several papers included as Findings of ACL, which is a designation given to additional papers that have been assessed by the conference program committee as solid work with sufficient substance, quality and novelty to warrant publication. The following papers were accepted to Findings:

Aspect-aware Unsupervised Extractive Opinion Summarization
Haoyuan Li, Somnath Basu Roy Chowdhury and Snigdha Chaturvedi

Evaluating the Factual Consistency of Large Language Models Through News Summarization
Derek Tam, Anisha Mascarenhas, Shiyue Zhang, Sarah Kwan, Mohit Bansal and Colin Raffel

Improving Classroom Dialogue Act Recognition from Limited Labeled Data with Self-Supervised Contrastive Learning Classifiers
Vikram Kumaran, Jonathan Rowe, Bradford Mott, Snigdha Chaturvedi and James Lester

LaSQuE: Improved Zero-Shot Classification from Explanations Through Quantifier Modeling and Curriculum Learning
Sayan Ghosh, Rakesh R. Menon and Shashank Srivastava