Our attempt to enable effective HRI via joining Computer Vision and Natural Language Processing.
This topic poses that interesting, that challenging questions. The first one refers to the form of intelligent behaviour to be investigated, i.e., based on what one can assess that a robot is understanding what is happening in its environment. To us, a reasonable way is testing the ability to produce a natural language description of generic visual sequences. The description can be seen as a manifestation of what the agent learned from the visual and textual data it processed during training and what the agent learned being important to be described. In addition, a natural language description is a good basis for natural language question answering about the events that the agent saw. Hence, this offers a friendly interface also for non-expert people which would then be allowed to effectively interact with their home robot in the near future.
Download
Our collected ISARLab-VD dataset
The complete results corpus for the RA-L paper "Full-GRU Natural Language Video Description for Service Robotics Applications"
The code for the RA-L submitted paper "Full-GRU Natural Language Video Description for Service Robotics Applications": coming soon
Citation
The bibtex for the preprint version is the following::
@ARTICLE{cascianelli2018natural,
author={S. Cascianelli and G. Costante and T. A. Ciarfuglia and P. Valigi and M. Fravolini},
journal={IEEE Robotics and Automation Letters},
title={Full-GRU Natural Language Video Description for Service Robotics Applications},
year={2018},
volume={PP},
number={99},
pages={1-1},
keywords={Computer architecture;Encoding;Feature extraction;Logic gates;Natural languages;Robots;Video sequences;Cognitive Human-Robot Interaction;Visual Learning},
doi={10.1109/LRA.2018.2793345},
ISSN={},
month={}}