CVPR 2025 Workshop

About

Today’s interconnected world presents unique challenges for intelligent systems in processing and integrating diverse data modalities, including text, audio, and visual data. However, traditional closed-world paradigms can fall short when faced with unseen classes and novel scenarios, which frequently emerge in complicated real-world environments. We propose the consideration of open-world learning as a way to build intelligent systems that are highly adaptable while also being robust and trustworthy, capable of tackling highly dynamic and creative tasks. Here, the integration of privacy-preserving techniques is crucial as data sources expand, particularly in high-stakes applications such as autonomous navigation systems for public safety. These systems must discern and adapt to evolving traffic patterns, weather conditions, and user behaviors in real time, underscoring the necessity of continuous learning and resilience against adversities. By exploring these critical challenges, this workshop aims to foster discussions that advance the development of trustworthy, multi-modal systems capable of thriving in open-world contexts.

Invited speakers

Rita Cucchiara
Universit`a di
Modena e Reggio Emilia

Ming-Hsuan Yang
University of
California at Merced

Tongliang Liu
University of
Sydney

Liang Zheng
Australian
National University

Yuki M. Asano
University of
Technology Nuremberg

Olga Russakovsky
Princeton
University

Call for papers

Submission Guidelines

Submission Site: TMM-OpenWorld@CVPR 2025.
Author Kit: CVPR Author KIT.
We invite submissions of both short and long papers (4 pages and 8 pages respectively excluding references). Potential topics include but are not limited to:

Open-World Multi-Modal Learning: Strategies to train systems on both labeled and unlabeled data while distinguishing known from unknown classes.
Dynamic Multi-Modal Vocabulary Expansion: Approaches to enable models to recognize and adapt to an expanding range of concepts using diverse inputs.
Robustness in Multi-Modal Systems: Methods to improve resilience against distribution shifts, adversarial attacks, and noise across modalities.
Multi-Modal Class Discovery: Techniques for identifying new classes across different modalities (e.g., visual, text, audio, touch).
Continual and Federated Learning for Multi-Modal Data: Innovative techniques that support ongoing learning and adaptation in decentralized environments.
Application-Focused Contributions: Research that showcases specific applications, such as autonomous navigation systems in urban environments that leverage multi-modal data for enhanced decision-making.

Clarifications/FAQs

Q1: Will accepted submissions be published in the CVPR proceedings or remain non-archival?
A: Accepted submissions will be published in the CVPR proceedings.

Q2: What's the policy on dual submissions?
A: Dual submissions are not permitted.

Awards

We will select six oral paper and set one best paper award according to the review results and presentation of a paper.

Important Dates

Dates and Deadlines
Workshop paper submission	24 March, 2025
Workshop paper notification	1 April, 2025
Workshop paper camera - ready	7 April, 2025
Workshops	11 June, 2025

Schedule

Date: Wednesday, June 11, 2025, 08:15 AM CDT
Venue: Room 101 E, Music City Center, Nashville, TN

Morning Session

8:00 - 8:15	Opening
8:15 - 8:25	Oral1: Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness
8:25 - 8:35	Oral2: IBD: Alleviating Hallucinations in Large Vision-Language Models via Image-Biased Decoding
8:35 - 8:45	Oral3: A Survey of State of the Art Large Vision Language Models: Benchmark Evaluations and Challenges
8:45 - 9:15	Invited Talk1: Olga Russakovsky (Princeton University) Title: Defining, Identifying and Generating Out-of-Distribution [Slides]
9:15 - 9:45	Invited Talk2: Liang Zheng (Australian National University) Title: Features for Trust-worthy Generative Models
9:45 - 10:00	Coffee Break
10:00 - 10:30	Invited Talk3: Lorenzo Baraldi (University of Modena and Reggio Emilia) on behalf of Rita Cucchiara Title: Building and Benchmarking Trustworthy Multimodal Models
10:30 - 11:00	Invited Talk4: Ming-Hsuan Yang (University of California at Merced) Title: Toward Grounding Anything in Images and Videos
11:00 - 12:00	Poster Session

Afternoon Session

14:30-14:40	Oral4: NuanDPO: Nuanced Cross-Modal Preference Optimization for Reducing Hallucinations in Multimodal LLMs
14:40-14:50	Oral5: Machine Unlearning in Hyperbolic vs. Euclidean Multimodal Contrastive Learning: Adapting Alignment Calibration to MERU
14:50-15:00	Oral6: Attention-Guided Hierarchical Defense for Multimodal Attacks in Vision-Language Models
15:00-15:30	Invited Talk5: Yuki M. Asano (University of Technology Nuremberg) Title: Learning from Moving in the World and Generalizing from Language Models
15:30-16:00	Invited Talk6: Tongliang Liu (University of Sydney) Title: Learning with Noisy Correspondence for Multi-Modality Data
16:00-16:10	Conclusion
16:10-17:00	Poster Session

Accepted Papers

🏆 Best Paper Award: Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness

Paper No.	Paper Title
1	Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness Best Paper Award Oral
2	A Survey of State of the Art Large Vision Language Models: Benchmark Evaluations and Challenges Oral
4	Attention-Guided Hierarchical Defense for Multimodal Attacks in Vision-Language Models Oral
5	On the Robustness of GUI Grounding Models Against Image Attacks
6	IBD: Alleviating Hallucinations in Large Vision-Language Models via Image-Biased Decoding Oral
7	Multimodal Generalized Category Discovery
8	Machine Unlearning in Hyperbolic vs. Euclidean Multimodal Contrastive Learning: Adapting Alignment Calibration to MERU Oral
13	PAINT: Paying Attention to INformed Tokens to Mitigate Hallucination in Large Vision-Language Model
14	NuanDPO: Nuanced Cross-Modal Preference Optimization for Reducing Hallucinations in Multimodal LLMs Oral
15	HARMONY: Hidden Activation Representations and Model Output-Aware Uncertainty Estimation for Vision-Language Models
17	Vision Language Models for Massive MIMO Semantic Communication

Navigating the Future: Ensuring Trustworthiness in Multi-Modal Open-World Intelligence

June 11th 2025, Nashville TN, America

Held in conjunction with the 2025 IEEE/CVF CVPR