Situational Awareness


Situational Awareness

The vision of this project is to develop a solution for automated processing of visual sensor data with the goal of endowing visual systems with the ability to be situationally aware. Being truly perceptive of one’s environment from visual cues is a multi-layered process. The first task is to observe the scene, i.e., localize and recognize the various agents in the environment. The second task is to understand the scene, i.e., infer fine-grained attributes of each agent such as 3D location, 3D pose, semantic segmentation masks, occlusion masks etc. and the geometric and functional relationship of agent-agent and agent-environment attributes. The third task is to anticipate the future evolution of the environment i.e., predict the likely transfor- Fig. 1 Situational Awareness. mation of the scene based on the functional understanding of the environment and the temporal understanding of the agent’s attributes. The fourth task is to take an appropriate action, i.e., based on the current state of the environment and its likely evolution, take an appropriate action to maximize our ability to truly perceive the environment.

Situational Awareness Overview

alt text Illustration of the various tasks in the situational awareness pipeline and how they are related to each other.


Pedestrian Detection

alt text For every grid location, geometrically correct renderings of pedestrian are synthetically generated using known scene information such as camera calibration parameters, obstacles (red), walls (blue) and walkable areas (green). All location-specific pedestrian detectors are trained jointly to learn a smoothly varying appearance model. Multiple scene-and-location-specific detectors are run in parallel at every grid location.


Pose Estimation

alt text Inferring Visual Attributes through Synthesis: For every small region, physically grounded and geometrically correct renderings of pedestrian are synthetically generated using known scene information such as camera calibration parameters, obstacles (red), walls (blue) and walkable areas (green). We train environment-and-region specific ShapeNets on the synthetically generated data. At inference, from each image our model will output detections, keypoint locations, occlusion labels and segmentation mask.


References