Navigating the Future: Ensuring Trustworthiness in Multi-Modal Open-World Intelligence

June 11th 2025, Nashville TN, America

Held in conjunction with the 2025 IEEE/CVF CVPR

More





About

Today’s interconnected world presents unique challenges for intelligent systems in processing and integrating diverse data modalities, including text, audio, and visual data. However, traditional closed-world paradigms can fall short when faced with unseen classes and novel scenarios, which frequently emerge in complicated real-world environments. We propose the consideration of open-world learning as a way to build intelligent systems that are highly adaptable while also being robust and trustworthy, capable of tackling highly dynamic and creative tasks. Here, the integration of privacy-preserving techniques is crucial as data sources expand, particularly in high-stakes applications such as autonomous navigation systems for public safety. These systems must discern and adapt to evolving traffic patterns, weather conditions, and user behaviors in real time, underscoring the necessity of continuous learning and resilience against adversities. By exploring these critical challenges, this workshop aims to foster discussions that advance the development of trustworthy, multi-modal systems capable of thriving in open-world contexts.





Invited speakers

Rita Cucchiara
Universit`a di
Modena e Reggio Emilia
Avatar
Ming-Hsuan Yang
University of
California at Merced
Avatar
Tongliang Liu
University of
Sydney
Avatar
Liang Zheng
Australian
National University
Avatar
Yuki M. Asano
University of
Technology Nuremberg
Avatar
Olga Russakovsky
Princeton
University

Call for papers

Submission Site: TMM-OpenWorld@CVPR 2025.
Author Kit: CVPR Author KIT.
We invite submissions of both short and long papers (4 pages and 8 pages respectively excluding references). Potential topics include but are not limited to:
  • Open-World Multi-Modal Learning: Strategies to train systems on both labeled and unlabeled data while distinguishing known from unknown classes.

  • Dynamic Multi-Modal Vocabulary Expansion: Approaches to enable models to recognize and adapt to an expanding range of concepts using diverse inputs.

  • Robustness in Multi-Modal Systems: Methods to improve resilience against distribution shifts, adversarial attacks, and noise across modalities.

  • Multi-Modal Class Discovery: Techniques for identifying new classes across different modalities (e.g., visual, text, audio, touch).

  • Continual and Federated Learning for Multi-Modal Data: Innovative techniques that support ongoing learning and adaptation in decentralized environments.

  • Application-Focused Contributions: Research that showcases specific applications, such as autonomous navigation systems in urban environments that leverage multi-modal data for enhanced decision-making.

Q1: Will accepted submissions be published in the CVPR proceedings or remain non-archival?
A: Accepted submissions will be published in the CVPR proceedings.

Q2: What's the policy on dual submissions?
A: Dual submissions are not permitted.

We will select six oral paper and set one best paper award according to the review results and presentation of a paper.

Dates and Deadlines
Workshop paper submission 24 March, 2025
Workshop paper notification 1 April, 2025
Workshop paper camera - ready 7 April, 2025
Workshops 11 June, 2025

Schedule

Date: Wednesday, June 11, 2025, 08:15 AM CDT
Venue: Room 101 E, Music City Center, Nashville, TN


8:00 - 8:15 Opening
8:15 - 8:25 Oral1: Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness
8:25 - 8:35 Oral2: IBD: Alleviating Hallucinations in Large Vision-Language Models via Image-Biased Decoding
8:35 - 8:45 Oral3: A Survey of State of the Art Large Vision Language Models: Benchmark Evaluations and Challenges
8:45 - 9:15 Invited Talk1: Olga Russakovsky (Princeton University)
Title: Defining, Identifying and Generating Out-of-Distribution
9:15 - 9:45 Invited Talk2: Liang Zheng (Australian National University)
Title: Features for Trust-worthy Generative Models
9:45 - 10:00 Coffee Break
10:00 - 10:30 Invited Talk3: Lorenzo Baraldi (University of Modena and Reggio Emilia) on behalf of Rita Cucchiara
Title: TBD
10:30 - 11:00 Invited Talk4: Ming-Hsuan Yang (University of California at Merced)
Title: Toward Grounding Anything in Images and Videos
11:00 - 12:00 Poster Session

14:30-14:40 Oral4: NuanDPO: Nuanced Cross-Modal Preference Optimization for Reducing Hallucinations in Multimodal LLMs
14:40-14:50 Oral5: Machine Unlearning in Hyperbolic vs. Euclidean Multimodal Contrastive Learning: Adapting Alignment Calibration to MERU
14:50-15:00 Oral6: Attention-Guided Hierarchical Defense for Multimodal Attacks in Vision-Language Models
15:00-15:30 Invited Talk5: Yuki M. Asano (University of Technology Nuremberg)
Title: Learning from Moving in the World and Generalising from Language Models
15:30-16:00 Invited Talk6: Tongliang Liu (University of Sydney)
Title: TBD
16:00-16:10 Conclusion
16:10-17:00 Poster Session

Accepted Papers

Paper No. Paper Title
1Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness
2A Survey of State of the Art Large Vision Language Models: Benchmark Evaluations and Challenges
4Attention-Guided Hierarchical Defense for Multimodal Attacks in Vision-Language Models
5On the Robustness of GUI Grounding Models Against Image Attacks
6IBD: Alleviating Hallucinations in Large Vision-Language Models via Image-Biased Decoding
7Multimodal Generalized Category Discovery
8Machine Unlearning in Hyperbolic vs. Euclidean Multimodal Contrastive Learning: Adapting Alignment Calibration to MERU
13PAINT: Paying Attention to INformed Tokens to Mitigate Hallucination in Large Vision-Language Model
14NuanDPO: Nuanced Cross-Modal Preference Optimization for Reducing Hallucinations in Multimodal LLMs
15HARMONY: Hidden Activation Representations and Model Output-Aware Uncertainty Estimation for Vision-Language Models
17Vision Language Models for Massive MIMO Semantic Communication

Organizers

Elisa Ricci
University of
Trento
Avatar
Andrew G. Wilson
New York
University
Avatar
Shin’ichi Satoh
National Institute of
Informatics
Avatar
Nicu Sebe
University of
Trento

Avatar
Wei Ji
Nanjing
University
 
Avatar
Hong Liu
Osaka
 University
 
Avatar
Zhun Zhong
Hefei University of
Technology
 
Avatar
Zhe Zeng
New York
 University
 

Avatar
Zhenglin Zhou
Zhejiang University