Our attempt to enable effective HRI via joining Computer Vision and Natural Language Processing.
This topic poses that interesting, that challenging questions. The first one refers to the form of intelligent behaviour to be investigated, i.e., based on what one can assess that a robot is understanding what is happening in its environment. To us, a reasonable way is testing the ability to produce a natural language description of generic visual sequences. The description can be seen as a manifestation of what the agent learned from the visual and textual data it processed during training and what the agent learned being important to be described. In addition, a natural language description is a good basis for natural language question answering about the events that the agent saw. Hence, this offers a friendly interface also for non-expert people which would then be allowed to effectively interact with their home robot in the near future.
Our collected ISARLab-VD dataset
The complete results corpus for the RA-L submitted paper "Towards effective Human-Robot Interaction via Full-GRU Natural Language Video Description"
The code for the RA-L submitted paper "Towards effective Human-Robot Interaction via Full-GRU Natural Language Video Description": coming soon