ISARLab

The Intelligent Systems, Automation and Robotics Laboratory

The More the Better: Multimodal Perception for Safer Autonomous Navigation

Workshop at the IEEE 21st International Conference on Automation Science and Engineering (CASE 2025)

Millennium Biltmore Downtown – Los Angeles, California, USA

August 21, 2025

Workshop Committee

Letizia Marchegiani (letizia.marchegiani@unipr.it) ¹

Gabriele Costante (gabriele.costante@unipg.it) ²

Alberto Dionigi (alberto.dionigi@unipg.it) ²

Katerina Vinciguerra (katerina.vinciguerra@unipr.it) ¹

Daniele De Martini (daniele@robots.ox.ac.uk) ³

Dimitri Ognibene (dimitri.ognibene@unimib.it) ⁴

Domenico G. Sorrenti (domenico.sorrenti@unimib.it) ⁴

Yan Wu (wuy@i2r.a-star.edu.sg) ⁵

Matthew Gadd (matthew.gadd@st-hildas.ox.ac.uk) ⁶

¹ University of Parma, Italy

² Department of Engineering, Università degli Studi di Perugia, Italy

³ Oxford Robotics Institute, UK

⁴ University of Milan-Biccoca, Italy

⁵ A*STAR Institute for Infocomm Research, Singapore

⁶ University of Oxford, UK

Key dates

Workshop paper submission OPEN

Workshop paper submission DEADLINE
.

Notification of acceptance

Final paper submission deadline

Workshop date

June 15th, 2025

July 15th, 2025, 11:59 PM (Pacific Time)
August 01, 2025, 11:59 PM (Pacific Time)

July 22nd, 2025
August 07, 2025

July 31st, 2025
August 10, 2025

August 21st, 2025

Abstract

Reliable and robust perception is fundamental to the safety of autonomous systems, especially in the context of driverless vehicles, where faults and mistakes might lead to disastrous consequences. Yet, no perception system is infallible, and no sensor can guarantee consistent performance in all possible circumstances. RGB cameras naturally struggle when illumination is scarce, for instance; LiDAR’s behaviouris affected by harsh weather conditions (e.g., heavy rain or fog). Combining sensing modalities in a mutually beneficial way, leveraging their strengths and mitigating their weaknesses, has been shown to lead to significant improvements in the accuracy and robustness of perception systems. This workshop aims to explore novel multimodal perception strategies in three ways. Firstly, it will investigate multimodal systems featuring the integration of sensors not typically applied to specific tasks. Secondly, it will explore some of the challenges still with holding the rapid development of multi-sensor suites, such as continuous calibration. Lastly, it will investigate innovative multimodal paradigms, such as end-to-end navigation schemes. Via these three themes, this workshop aims to stimulate discussion and research into multimodal perception, to improve the reliability and accuracy of transportation systems.

Keywords: Autonomous Driving, Intelligent Transportation Systems, Multimodal Perception, Sensing, Localization, Visual Learning, Sensor fusion, Multimodal Navigation, Machine Learning, Deep Learning, Signal and Image Processing, Multi-Sensor Calibration, End-to-end Multimodal Navigation, Deep Reinforcement Learning for Perception-to-Action, Perception-Action Coupling, Cooperative Perception, Distributed Perception.

Call for Papers

Author Guidelines

The use of artificial intelligence (AI)–generated text in an article shall be disclosed in the acknowledgements section of any paper submitted. The sections of the paper that use AI-generated text shall have a citation to the AI system used to generate the text.

Paper Submission Guidelines

Please read the following paper submission guidelines before submitting your papers:
• We invite participants to submit 2 pages + references extended abstracts.
• Contributions can be original works, previously published works, or concise reports on research conducted in recent years on the topics of interest mentioned above. Submissions may also summarize the results of recently concluded or ongoing research projects or illustrate specific experimental applications of theoretical results.

• The allowed paper length is 2+N, meaning 2 pages for the paper content, plus any number of pages for the references. Be aware that longer papers will be automatically REJECTED.
• All papers must be submitted through the CMT Microsoft online submission system.
• The accepted abstracts will be made available on the workshop website but will not appear in the official IEEE conference proceedings.
• New authors cannot be added after your paper has been accepted. Please ensure that you are following this guideline to avoid any issues with publication.

If you encounter any problems with the submission of your papers, please contact the conference Technical Program Chairs.

Paper Submission System

We adopt Microsoft CMT as submission system, available at the following link: https://cmt3.research.microsoft.com/TMTB2025/

You can find detailed instructions on how to submit your paper here.

LaTeX and Word Templates

To help ensure correct formatting, please use the IEEE style files for conference proceedings as a template for your submission:
https://www.ieee.org/conferences/publishing/templates.html. These include LaTeX and Word style files.

Violations of any paper specification may result in rejection of your paper.

Manuscript Style Information

• Only papers in PDF format will be accepted.
• Paper Size: A4 (210mm x 297mm).
• Paper Length: Each paper should have 2+N pages, meaning 2 pages for the paper content, plus any number of pages for the references.
• Paper Formatting: double column, single spaced, #10 point Times Roman font. Please make sure to use the official IEEE style files provided above.
• No page numbers.

Note: Violations of any of the above specifications may result in rejection of your paper.

Acknowledgments

The Microsoft CMT service was used for managing the peer-reviewing process for this conference. This service was provided for free by Microsoft and they bore all expenses, including costs for Azure cloud services as well as for software development and support.

Schedule and list of Speakers

8:10 - 8:30: Introduction

8:30 - 9:10: Lars Michael Ohnemus and Lukas Müller (Karlsruhe Institute of Technology)

Title
Rethinking the Need for Additional Multimodal Benchmark Datasets in Semantic Perception for Mobile Robotics: Insights from Literature and Practice

Abstract
Data is essential for semantic perception tasks, such as object detection and semantic segmentation. This is especially true for multimodal perception, in which complementary sensors, such as LiDAR, cameras, and radar, are combined using machine learning models to enhance performance, reliability, and safety. Numerous public datasets exist in adjacent domains, such as autonomous driving. However, few multimodal datasets are available for outdoor mobile robotics. Furthermore, systematic comparisons of these resources are limited. In this talk, we present the results of a large-scale meta-analysis of the existing literature and compare 31 multimodal datasets across six key perception tasks. Using the results of the review, we attempt to answer the following question: Are the available datasets sufficient, or do we need additional data for certain combinations of domains, modalities, and tasks? Drawing on our experience with multimodal setups for autonomous delivery robots, we discuss the discrepancy between research driven by benchmarks and practical requirements. Finally, we offer perspectives on desirable properties for future multimodal datasets that could better support robust scene understanding in mobile robotics.

9:10 - 9:30: Katerina Vinciguerra (University of Parma)

Title
It’s the LiSA title project: LiSA – Listen, See and Act: Fusing audio-video cues to perceive visible and invisible events and develop perception-to-action solutions for autonomous vehicles

Bio
Katerina Vinciguerra received an MSc degree in Management from Grenoble Ecole de Management and an MS in Big Data with INP Grenoble (ENSIMAG) (France) in 2021, after two years of Higher School Preparatory Classes. From 2021 to 2023 she worked as a Data Scientist at KPMG. Since 2024 she is a PhD Student in Information Technologies with the University of Parma (Italy). Her research interests primarily lie in machine learning, multimodal modeling, and their application to robotics, smart mobility, autonomous systems, and self-driving technology.

10:30 - 11:10: Ross Greer (University of California Merced)

Title
Vision and Language in Safe Autonomous Driving: Data Representations for Efficient Learning and Planning

Abstract
Modern intelligent vehicles provide some level of automation but require a human driver to be able to take over at a moment’s notice. In this way, the human is still very much “in the loop” of the realized system. Yet, there are other human influences on the performance of these autonomous systems besides interactivity while driving. Humans provide the annotations which enable data-driven learning methods to train models for operation in the real world, and looking beyond the horizon of supervised, class-index annotation approaches, we introduce the idea of natural language and latent representations as intermediaries for detecting and navigating unusual, unexpected, and novel scenes. Finally, we will contextualize the use of this novelty detection for not only safe control transitions, but also towards data curation and active learning paradigms for continual learning and decision-making in ever-changing driving environments.

Bio
Dr. Ross Greer is an Assistant Professor in the Computer Science & Engineering Department at the University of California, Merced, where they lead the Mi³ Lab focused on machine intelligence, interaction, and imagination. Their research combines computer vision and AI to develop systems that enhance human-machine interaction and safety, especially in the context of autonomous driving. Their work addresses challenges such as open-world adaptability, long-tail distributions, and safe human-robot coexistence by developing new learning architectures, uncertainty estimation methods, and data curation techniques. In autonomous systems, they study planning, driver behavior, object salience, and trajectory prediction. Beyond driving, their research extends to AI-driven musical interaction, exploring creativity and expression in human-machine collaboration. They earned their Ph.D. from UC San Diego under Mohan Trivedi, with prior degrees from UCSD and UC Berkeley. Their work has been supported by the Qualcomm Innovation Fellowship, IRCAM, Toyota CSRC, AWS, and UCOP ILTI.

11:10 - 11:50: Michael Biehler (University of Wisconsin-Madison)

Title
Dynamic 4D Modeling: A New Frontier for Multimodal Perception

Abstract
We live in a 4-dimensional world. As we perceive our world, we observe a stream of inputs from a 3D world, which, taken together in time, evolves in 4D. Autonomous systems are increasingly capable of capturing this complexity through a suite of onboard sensors – LiDAR, RGB cameras, radar, and more. Yet their ability to integrate and interpret these dynamic multimodal data streams remains limited. Dynamic 4D point cloud modeling has emerged as a powerful approach for representing the evolving structure of real-world environments. By treating perception as a spatiotemporally grounded, multimodal task, this paradigm moves beyond static scene understanding toward continuous, 4D predictive models of the world. This modeling paradigm will unlock powerful new capabilities, such as capturing fine-grained pedestrian motion or anticipating vehicle trajectories under occlusion. This talk will highlight how dynamic 4D modeling combined with multimodal fusion can enable autonomous systems that truly understand, anticipate, and adapt to the 4-dimensional world around them.

Bio
Dr. Michael Biehler is an Assistant Professor in the Department of Industrial and Systems Engineering at the University of Wisconsin-Madison. He earned his Ph.D. from the H. Milton Stewart School of Industrial and Systems Engineering at the Georgia Institute of Technology, following a B.S. and M.S. from the Karlsruhe Institute of Technology in 2017 and 2020, respectively. Dr. Biehler’s research lies at the intersection of multi-modal data fusion, 3D machine learning, and advanced 3D/4D printing. He is dedicated to developing foundational methodologies, computational tools, and cutting-edge experimental platforms that advance these fields. His work has resulted in 17 publications in leading journals, earned seven best paper awards, and has been supported by 16 scholarships and fellowships from various professional organizations. His research has also been sponsored by General Motors.

ISARLab

The More the Better: Multimodal Perception for Safer Autonomous Navigation

Workshop at the IEEE 21st International Conference on Automation Science and Engineering (CASE 2025)

Millennium Biltmore Downtown – Los Angeles, California, USA

August 21, 2025

Workshop Committee

Letizia Marchegiani (letizia.marchegiani@unipr.it) 1

Gabriele Costante (gabriele.costante@unipg.it) 2

Alberto Dionigi (alberto.dionigi@unipg.it) 2

Katerina Vinciguerra (katerina.vinciguerra@unipr.it) 1

Daniele De Martini (daniele@robots.ox.ac.uk) 3

Dimitri Ognibene (dimitri.ognibene@unimib.it) 4

Domenico G. Sorrenti (domenico.sorrenti@unimib.it) 4

Yan Wu (wuy@i2r.a-star.edu.sg) 5

Matthew Gadd (matthew.gadd@st-hildas.ox.ac.uk) 6

1 University of Parma, Italy

2 Department of Engineering, Università degli Studi di Perugia, Italy

3 Oxford Robotics Institute, UK

4 University of Milan-Biccoca, Italy

5 A*STAR Institute for Infocomm Research, Singapore

6 University of Oxford, UK