The Intelligent Systems, Automation and Robotics Laboratory

Artificial Perception: state of the art and beyond

Workshop at the 5th Italian Conference for Robotics and Intelligent Machines (I-RIM)

Fiera di Roma – Rome, Italy

October 21, 2023 (11.45 – 13.45)



Workshop Organizers


Gabriele Costante ( 1

Matteo Matteucci ( 2

Ettore Stella ( 3


1 Department of Engineering, University of Perugia

2 Department of Information, Electronics and Bioengineering – Politecnico di Milano




The capability to extract information from raw data collected by contactless sensors is one of the founding stones to develop AI-based solutions. This workshop aims to gather knowledge and expertise from the world of research on methodologies and technologies for interpreting data from sensors and extracting information for the application context of interest. The data sources can be heterogeneous and the methodologies may include model-based, data-driven and deep learning-based strategies, also considering multi-sensor setups.

List of Speakers (Title and Abstract)

11:45 - 12:08: Chiara Plizzari (Department of Control and Computer Engineering, Politecnico di Torino, Italy)


Cross-Domain Egocentric Action Recognition

Despite the numerous publications in the field, egocentric action recognition still has one major flaw that remains unsolved, known as “environmental bias”. This problem arises from the network’s heavy reliance on the environment in which the activities are recorded, which inhibits the network’s ability to recognize actions when they are conducted in unfamiliar (unseen) surroundings. This problem is known in the literature as domain shift, meaning that a model trained on a source labelled dataset cannot generalize well on an unseen dataset, called target. Most of the researchers in the field addressed this issue by reducing the problem to an unsupervised domain adaptation (UDA) setting, where an unlabeled set of samples from the target is available during training. However, the UDA scenario is not always realistic, because the target domain might not be known a priori or because accessing target data at training time might be costly (or plainly impossible). To this purpose, she will present a couple of works which aim to address the so called Domain Generalization setting, consisting in learning a representation able to generalize to any unseen domain, regardless of the possibility to access target data at training time. Taking inspiration from recent works on video-text and audio-visual unsupervised learning, she will show how to solve auxiliary tasks across various information channels from videos in a way that makes the solution of such tasks consistent across information channels and gains robustness from it. Additionally, she will also focus on the possibility to use event data to reduce the impact of the domain shift by using different hardware than standard RGB cameras. Event cameras are novel bio-inspired sensors, which asynchronously capture pixel-level intensity changes in the form of events. Due to their sensing mechanism, event cameras have little to no motion blur, a very high temporal resolution and require significantly less power and memory than traditional frame-based cameras. These characteristics make them a perfect fit to several real-world applications such as egocentric action recognition on wearable devices, where fast camera motion and limited power challenge traditional vision sensors.

12:08 - 12:31: Letizia Marchegiani (Department of Engineering and Architecture, University of Parma, Italy)


Let There Be Dark: Beyond Traditional Sensing for Robust Perception

In their path towards reaching full autonomy, robots need to safely operate in any environmental condition; whether it’s dark, it rains, or the sun is shining, they will have to accurately and robustly perceive their surroundings to be able to act sensibly in response. Recent years have seen enormous progress in the development of perception models and algorithms, relying mostly on visible imaging sensors and lasers, with a major impact on the performance and reliability of autonomous systems. Yet, those sensors still naturally struggle in particular situations, such as lack of illumination or harsh weather conditions. Is this, though, an unbeatable limitation? Are we actually taking advantage of all the technology available? This talk will discuss how less explored sensing modalities can be exploited, solo, and in combination, to enhance a vehicle’s awareness of the environment. Through the analysis of a set of use cases, we will discuss the challenges and the opportunities that “untraditional” sensing conveys to mobile robotics, and their potential to shed some light into the darkness.

12:31 - 12:54: Tiziana D’Orazio (CNR-STIIMA, Bari, Italy)


Intelligent Perception Systems for Workers’ Wellbeing in collaborative task contexts.

Recent demographic and social changes, as well as the new production paradigm of Industry 4.0, have brought new challenges to occupational safety, satisfaction and well-being of workers. The different categories of workers, such as experts with high skills, young people in training, the elderly, and the disabled, have different needs. In this context, observing people becomes a crucial point for adapting work rhythms to disparate cognitive and physical abilities, analysing physical and mental stress, preventing work-related accidents and illnesses, and making the working environment safe and comfortable. The outstanding development of technologies and data-analysis methodologies of the last years provides advanced tools to address these challenging issues, considering the person as the central element. In particular, developing intelligent perception systems for observing people, analysing their behaviour and looking at their performance can help to detect patterns, trends, anomalies or other valuable insights for enabling appropriate actions, minimizing risks and improving workers’ quality of life. In this presentation, an example of an intelligent perception system for monitoring workers in an assembly task with a collaborative robot will be described. In particular, the problems related to the variability of behaviours, the construction of robust methodologies for action recognition, and the real-time segmentation of different actions will be discussed. Some examples of real experiments in a working cell with a cobot and some workers during an assembly task will be presented.

12:54 - 13:17: Simonetta Grilli (ISASI-CNR, Pozzuoli, Italy)


Pyro-electrohydrodynamic jetting: a new frontier for biosensing and related applications of artificial intelligence

The detection of low abundant biomarkers in peripheral body fluids (e.g. capillary blood; urine; saliva) is of
vital importance for early diagnosis applications as well as for follow-up therapies, avoiding invasive withdrawal
techniques for the patients. The current clinical tests rely on ELISA-based procedures which limit of detection is
around 50 pg/mL on average. Unfortunately, the concentration of biomarkers in peripheral body fluids falls well
below this limit. The pyro-electrohydrodynamic jetting (p-jet) is able to accumulate tiny droplets of sample making
biomarkers molecules highly concentrated. Moreover, the regime of small volumes of sample and the contact-free
modality make the p-jet a good candidate for the development of an innovative rheological tool for liquid polymers
in combination with artificial intelligence thus opening novel applications in biomedical as well as in Industry
processing materials in liquid phase.

13:17 - 13:40: Elisa Ricci (Department of Information Engineering and Computer Science, University of Trento, Italy - Fondazione Bruno Kessler (FBK), Trento, Italy)


Multi-modal human behaviour analysis for social robotics

Automated analysis of social interactions is critical for a number of applications such as surveillance, robotics and social signal processing. In particular, the automatic analysis of conversational groups is of fundamental importance for developing technologies for Human- Robot Interaction and require addressing simultaneously different tasks (e.g. people tracking, voice recognition, head and body pose estimation, F-formation detection, emotion estimation) processing multimodal data gathered from different sensors. In this talk I will present some of our recent works in this area, focusing on the methods and the technologies developed during the EU project SPRING.