Main Profile

At A Glance

Driving by Computer Vision

Recently there has been a trend for companies such as Google and Microsoft, to record video from a specialized car in order augment their mapping applications with real world imagery giving rise to the highly popular 'street view' style. If a machine is able to automatically pigeon-hole every pixel in these image sequences as belonging to a certain object of interest, such as car, road and sky, then we can say that it can produce a semantic output, that is an output that has meaning for humans. Thus we can go further than just imagery and can augment maps with semantic information that is important for many reasons such asset management, advertising, and driver assistance. In this video of we see four views of streets recorded from a moving car [1]. The top right hand corner shows the raw images, the type you may see for example in Google street view. In the top left hand side we see the image coloured in by a human according to a predefined set of interesting objects, each colour represents a different class. The bottom row shows two machine made outputs, these are fully automatic [2]. In the bottom left corner the notion of context is encoded by a simple observation that is each neighboring point in the image should be encouraged to belong the same object class. On the bottom right, more complex contextual relations have been encoded that also encourage larger regions that share some common properties, such as colour, to belong to the same object class. We can see that exploiting more complex contextual relations produces an output that is closer to the human annotation depicted the top left.For details of the recorded images and the human labels please refer to:[1] G. J. Brostow, J. Fauqueur, and R. Cipolla. Semantic object classes in video: A high- definition ground truth database. Pattern Recognition Letters, 30(2):88--97, 2009.For details of the system used to produce the machine outputs please refer to:[2] Sturgess, P., Alahari, K., Ladicky, L., Torr, P.H.S.: Combining appearance and structure from motion features for road scene understanding. In: BMVC. (2009)
Length: 01:23


Questions about Driving by Computer Vision

Want more info about Driving by Computer Vision? Get free advice from education experts and Noodle community members.

  • Answer