ESR 5

Scene understanding and interaction anticipation from first person vision

Hiring Institution

University of Catania

PhD enrollment

Doctoral School in Computer Science of the university of Catania

YOUR TEAM: You will work at the Department of Mathematics and Computer Science of the University of Catania.

GOAL: The objective of this project is to design and develop algorithms to allow a First Person Vision System to observe the scene from the point of view of the user in order to understand and anticipate the location or context in which the user is operating, the object he is interacting with and the action he performs. These algorithms can be exploited to support people with cognitive decline or disabilities (e.g., people with Alzheimer’s disease) for memory augmentation purposes (e.g., showing a video on how to use an object after the recognition of the object) or to make summaries (e.g., albums) of the acquired first person videos for personalised health (e.g., build a short video summary of a mother during pregnancy in different time instants, or build mother and baby journey albums). Moreover, the inferred information can be exploited by an external agent, e.g., a robot, to make decisions and assist the user during daily activities.

ESR activities

  • localization and context identification, i.e., recognizing the location in which the user is operating;
  • object recognition and object-interaction anticipation, i.e., recognizing the object the user is interacting with and anticipating user-object interactions:
  • recognition and prediction of actions, i.e., recognizing the actions performed by the user and anticipate future actions.

Expected results

  • To allow for the construction of innovative techniques for scene understanding and interaction anticipation from First Person Vision data, the PhD fellow will first research the state of the art techniques related to the objectives of the project. This process will also require the acquisition of basic knowledge on Computer Vision and ML technologies such as: 3D reconstruction and Deep Learning based techniques. The production of a survey of the related state of the art is expected within the first year of the PhD programme;
  • The ESR fellow will research and study the publicly available datasets which could be useful for the intended research and highlight whether the available material is sufficient to carry out each aspect of the planned research. If the available data is deemed to be insufficient to cover some aspects of the research, the acquisition of new datasets will be planned and carried out. The expected output of this process is the construction of a repository of existing or new data. All collected data will be documented to facilitate future research;
  • The ESR fellow will investigate and develop algorithms to localize the user and recognize the context in which he is operating. The level according to which such location aware algorithms need to be developed will be studied keeping in mind the use-cases related to the project. The output consists in the definition of a set of algorithmic tools useful to implement the needed degree of location awareness to the developed First Person Vision system;
  • The ESR fellow will investigate algorithms for object detection from First Person Visual data, as well as algorithms to anticipate (i.e., detect before they occur) the object the user is going to interact with. This will allow an external agent, e.g., a robot, to know in advance which objects the user is going to interact with in order to make decisions and plan its reaction;
  • The ESR fellow will investigate algorithms to recognize the actions performed by the user and anticipate them (i.e., detecting actions before they are performed) will also be developed. Such algorithms will allow an external agent, e.g. a robot, to anticipate the user’s intention to help the user accomplishing a goal, send alarms in the case of missing actions or prevent a dangerous action to be performed;
  • The results of the research will be disseminated in international conferences and journals in order to foster research in the area and facilitate the advancement of the field. Patents on the innovative algorithms will be considered together with the other partners.

Additional essential requirements

  • Master degree in Computer Science, Information Engineering (or equivalent). A degree with distinction (cum laude) is an advantage;
  • Prior knowledge in Computer Vision, Machine Learning and Deep Learning is an advantage;
  • Prior publications at international conferences or journals are desirable;
  • Ability to program in Python is an advantage;
  • Communication skill and team play are desirable.

Principal Investigator: Prof. Giovanni Maria Farinella, UNICT
Academic PhD Supervisor: Giovanni Maria Farinella (UNICT)
Academic PhD Co-Supervisor: Sebastiano Battiato (UNICT)
Industrial PhD Supervisor: Dimitrios Mavroeidis (PHILIPS)

Main Contact: Prof. Giovanni Maria Farinella
Email: gfarinella@dmi.unict.it