January 10, 2025
Photo: Ten faculty members from the Department of Computer Science and School of Data Science and Society presented research at NeurIPS 2024. Top row: Mohit Bansal, Snigdha Chaturvedi, Tianlong Chen, Neil Gaikwad, and Yun Li; Bottom row: Marc Niethammer, Guorong Wu, Huaxiu Yao, Weitong Zhang, and Hongtu Zhu
Researchers from the Department of Computer Science and School of Data Science and Society shared research at the Conference on Neural Information Processing Systems (NeurIPS). The group had a total of 16 papers accepted by the conference, one of which was selected for spotlight.
NeurIPS is an annual conference primarily featuring machine learning and computational neuroscience research. It is considered one of the three primary conferences in the areas of machine learning and artificial intelligence.
Papers from the Department of Computer Science were authored by personnel from the research groups of Distinguished Professor Mohit Bansal, Associate Professor Snigdha Chaturvedi, Assistant Professor Tianlong Chen, Adjunct Assistant Professor Yun Li, Professor Marc Niethammer, Adjunct Associate Professor Guorong Wu, Assistant Professor Huaxiu Yao (joint appointment with the School of Data Science and Society), and Adjunct Professor Hongtu Zhu. School of Data Science and Society Assistant Professor Weitong Zhang also contributed two papers.
Li, Wu, and Zhu have adjunct appointments in the Department of Computer Science and primary appointments in the School of Medicine, with Li’s in the Department of Genetics and the Department of Biostatistics, Wu’s in Psychiatry, and Zhu’s in Biostatistics.
In addition to publishing research, faculty members from both groups participated in workshops on a variety of topics. During the Workshop “GenAI for Health: Potential, Trust, and Policy Compliance,” Chen was part of a team that earned a Best Demo Award for their demo, “An Exploration of LLM-Guided Conversation in Reminiscence Therapy.” School of Data Science Assistant Professor Neil Gaikwad delivered a keynote address, “Whose Evidence? Equity-Centered Design of AI and Policy for Societal Alignment,” during the workshop and participated in a panel discussion, addressing the critical challenges of developing equitable and policy-compliant generative AI systems in health care.
The paper selected for spotlight by the conference, “Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts,” was co-authored by computer science doctoral student Sukwon Yun and his advisor Chen, as well as researchers from the University of Pennsylvania and the University of Science and Technology of China. The paper presented Flex-MoE, a new framework designed to strengthen the usefulness of multimodal learning tasks on medical data when one or more modalities (such as images, text, or personal records) is missing. Flex-MoE flexibly incorporates arbitrary modality combinations while maintaining robustness to missing data.
Brief summaries of all 16 accepted publications can be found below, and each paper can be read in full at the page linked from its title. Bolded researchers have faculty appointments or are postdoctoral researchers or graduate students in the Department of Computer Science or School of Data Science and Society.
Achieving Constant Regret in Linear Markov Decision Processes
Weitong Zhang, Zhiyuan Fan, Jiafan He, Quanquan Gu
In this paper, researchers focused on the constant regret guarantees in reinforcement learning (RL) and set out to design an algorithm that incurs only finite regret over infinite episodes with high probability. The paper presents Cert-LSVI-UCB, the first algorithm to achieve a constant, instance-dependent, high-probability regret bound in RL with linear function approximation without relying on prior distribution assumptions.
Calibrated Self-Rewarding Vision Language Models
Yiyang Zhou*, Zhiyuan Fan*, Dongjie Cheng*, Sihan Yang, Zhaorun Chen, Chenhang Cui, Xiyao Wang, Yun Li, Linjun Zhang, Huaxiu Yao
*equal contribution
Large Vision-Language Models (LVLMs) often exhibit the hallucination phenomenon, where generated text responses appear linguistically plausible but contradict the input image, indicating a misalignment between image and text pairs. This typically arises when the model prioritizes textual information over visual input. This paper proposes the Calibrated Self-Rewarding approach, which enables the model to self-improve by iteratively generating candidate responses, evaluating the reward for each response, and curating preference data for fine-tuning.
CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models
Peng Xia, Ze Chen, Juanxi Tian, Yangrui Gong, Ruibo Hou, Yue Xu, Zhenbang Wu, Zhiyuan Fan, Yiyang Zhou, Kangyu Zhu, Wenhao Zheng, Zhaoyang Wang, Xiao Wang, Xuchao Zhang, Chetan Bansal, Marc Niethammer, Junzhou Huang, Hongtu Zhu, Yun Li, Jimeng Sun, Zongyuan Ge, Gang Li, James Zou, Huaxiu Yao
This work was partly undertaken when Xia was at Monash University in Australia
This paper aimed to comprehensively evaluate the trustworthiness of Medical Large Vision Language Models (Med-LVLMs) across the medical domain. It assessed the trustworthiness of Med-LVLMs across five dimensions: trustfulness, fairness, safety, privacy, and robustness. The analysis revealed that the models often display factual inaccuracies and fail to maintain fairness across different demographic groups. Furthermore, they are vulnerable to attacks and demonstrate a lack of privacy awareness.
Fast Tree-Field Integrators: From Low Displacement Rank to Topological Transformers
Krzysztof Choromanski*, Arijit Sehanobish*, Somnath Basu Roy Chowdhury*, Han Lin*, Kumar Avinava Dubey*, Tamas Sarlos, Snigdha Chaturvedi
*equal contribution
This paper presents a new class of fast polylog-linear algorithms based on the theory of structured matrices (in particular low displacement rank) for integrating tensor fields defined on weighted trees. Several applications of the resulting fast tree-field integrators are presented, including approximation of graph metrics with tree metrics, graph classification, modeling on meshes, and Topological Transformers for images.
[SPOTLIGHT] Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts
Sukwon Yun, Inyoung Choi, Jie Peng, Yangfan Wu, Jingxuan Bao, Qiyiwen Zhang, Jiayi Xin, Qi Long, Tianlong Chen
This paper presents Flex-MoE, a new framework designed to strengthen the usefulness of multimodal learning tasks on medical data when one or more modalities (such as images, text, or personal records) is missing. Flex-MoE flexibly incorporates arbitrary modality combinations while maintaining robustness to missing data.
GDeR: Safeguarding Efficiency, Balancing, and Robustness via Prototypical Graph Pruning
Guibin Zhang*, Haonan Dong*, yuchen zhang, Zhixun Li, Dingshuo Chen, Kai Wang, Tianlong Chen, Yuxuan Liang, Dawei Cheng, Kun Wang
*equal contribution
Data pruning can streamline the data volume needed to train deep learning models, but it often suffers significant performance degradation with imbalanced or biased data schema. Unlike the fields of computer vision and natural language processing, where mature solutions have been developed to address these issues, graph neural networks (GNNs) continue to struggle with increasingly large-scale, imbalanced, and noisy datasets, lacking a unified dataset pruning solution. This paper introduces a novel dynamic soft-pruning method, GDeR, designed to update the training “basket’” during the process using trainable prototypes.
GTBench: Uncovering the Strategic Reasoning Capabilities of LLMs via Game-Theoretic Evaluations
Jinhao Duan*, Renming Zhang*, James Diffenderfer, Bhavya Kailkhura, Lichao Sun, Elias Stengel-Eskin, Mohit Bansal, Tianlong Chen, Kaidi Xu
*equal contribution
As Large Language Models (LLMs) are integrated into critical real-world applications, their strategic and logical reasoning abilities are increasingly crucial. This project evaluated LLMs in competitive environments through game-theoretic tasks, including board and card games that require pure logic and strategic reasoning to compete with opponents. The paper proposed GTBench, a language-driven environment composing 10 widely recognized tasks across a comprehensive game taxonomy, then characterized the game-theoretic reasoning of various LLMs and performed adversarial competitions as reasoning evaluation.
LACIE: Listener-Aware Finetuning for Calibration in Large Language Models
Elias Stengel-Eskin, Peter Hase, Mohit Bansal
When answering questions, LLMs can convey not only an answer, but a level of confidence in answer’s accuracy. This includes explicit confidence markers, like a numeric score, as well as implicit markers, like authoritative tone or elaboration. Most current models, however, tend towards overconfidence. To calibrate both implicit and explicit confidence markers, this paper introduces a pragmatic, listener-aware finetuning method (LACIE) that models the listener, considering not only whether an answer is right, but whether it will be accepted by a listener.
Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild
Xinyu Zhao*, Guoheng Sun*, Ruisi Cai*, Yukun Zhou*, Pingzhi Li*, Peihao Wang*, Bowen Tan, Yexiao He, Li Chen, Yi Liang, Beidi Chen, Binhang Yuan, Hongyi Wang†, Ang Li†, Zhangyang “Atlas” Wang†, Tianlong Chen†
*equal contribution, †equal supervision
This paper introduces Model-GLUE, a holistic guideline for scaling Large Language Models (LLMs) by combining disparate models. Various techniques have been proposed for the aggregation of pre-trained LLMs, but a comprehensive comparison and synergistic application of them to a diverse model zoo was so far yet to be adequately addressed. The work starts with a benchmarking of existing LLM scaling techniques and then formulates an optimal strategy for the selection and aggregation of a heterogeneous model zoo characterizing different architectures.
NeuroPath: A Neural Pathway Transformer for Joining the Dots of Human Connectomes
Ziquan Wei, Tingting Dan, Jiaqi Ding, Guorong Wu
Modern imaging technologies allow for the study of connectivity between two distinct brain regions, but understanding exactly how anatomical structure supports brain function is still elusive, in part due to the absence of neuroscience insight in current approaches. In this project, researchers proposed a biological-inspired deep model, NeuroPath, to find putative connectomic feature representations from a large amount of neuroimages, which can then be used for downstream applications like task recognition and disease diagnosis.
SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data
Jialu Li*, Jaemin Cho*, Yi-Lin Sung, Jaehong Yoon, Mohit Bansal
*equal contribution
Text-to-image (T2I) generation models can create images from text descriptions, but these T2I generation models often fail to generate images that precisely match the details of the inputs, exhibiting errors like incorrect spatial relationships or missing objects. This paper introduces Skill-Specific Expert Learning and Merging with Auto-Generated Data (SELMA), a novel paradigm to improve the faithfulness of T2I models by fine-tuning models on automatically generated, multi-skill image-text datasets, with skill-specific expert learning and merging.
Stability and Generalizability in SDE Diffusion Models with Measure-Preserving Dynamics
Weitong Zhang, Chengqi Zang, Liu Li, Sarah Cechnicka, Cheng Ouyang, Bernhard Kainz
Inverse problems involve determining the causal factors or sources that produce observed data. Diffusion models are a promising tool for solving inverse problems, but most existing approaches are too limited to address the challenging nature of many real-world problems. The researchers in this project developed several strategies to enhance the stability and generalizability of diffusion models for inverse problems and applied them to a new framework, the Dynamics-aware SDE Diffusion Generative Model (D3GM), which shows promising applications for magnetic resonance imaging (MRI).
Structured Unrestricted-Rank Matrices for Parameter Efficient Finetuning
Arijit Sehanobish*, Kumar Avinava Dubey*, Krzysztof Choromanski*, Somnath Basu Roy Chowdhury*, Deepali Jain, Vikas Sindhwani, Snigdha Chaturvedi
*equal contribution
Parameter-efficient fine-tuning (PEFT) approaches have emerged as an option to improve the performance of large-scale transformer models by updating only a small number of the many parameters during the tuning process. This paper proposes a PEFT framework based on structured unrestricted-rank matrices which can serve as a drop-in replacement for popular modern approaches. This approach provides more flexibility in balancing compactness and expressiveness, which is achieved through the novel use of low displacement rank matrices.
S2FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity
Xinyu Yang, Jixuan Leng, Geyang Guo, Jiawei Zhao, Ryumei Nakada, Linjun Zhang, Huaxiu Yao, Beidi Chen
Current parameter-efficient fine-tuning methods for Large Language Models (LLMs) can achieve either high quality, efficient training, or scalable serving, but not all three simultaneously. To address this limitation, the researchers investigated sparse fine-tuning and observed a remarkable improvement in generalization ability. Utilizing this key insight, this paper proposes a family of Structured Sparse Fine-Tuning (S2FT) methods for LLMs, which concurrently achieve state-of-the-art fine-tuning performance, training efficiency, and inference scalability.
Test-time Adaptation in Non-stationary Environments via Adaptive Representation Alignment
Zhen-Yu Zhang, Zhiyu Xie, Huaxiu Yao, Masashi Sugiyama
This paper tackles the challenge of distribution shifts in machine learning by leveraging non-stationary representation learning to adaptively align the unlabeled data stream to the source data representation using a sketch of the source data. To alleviate the data scarcity in non-stationary representation learning, the paper proposed a novel adaptive representation alignment algorithm called Ada-ReAlign.
VHELM: A Holistic Evaluation of Vision Language Models
Tony Lee*, Haoqin Tu*, Chi Heem Wong*, Wenhao Zheng, Yiyang Zhou, Yifan Mai, Josselin Roberts, Michihiro Yasunaga, Huaxiu Yao, Cihang Xie, Percy Liang
*equal contribution
Current benchmarks for assessing vision-language models (VLMs) often focus on their perception or problem-solving capabilities and neglect other critical aspects. The researchers in this project extended an existing framework to VLMs to present the Holistic Evaluation of Vision Language Models (VHELM), which aggregates various datasets to cover visual perception, knowledge, reasoning, bias, fairness, multilinguality, robustness, toxicity, and safety, in order to produce a comprehensive, multi-dimensional view of the capabilities of VLMs across these important factors.