Self-driving cars are both fascinating and fear-inducing as they must accurately assess and navigate the rapidly changing environment. Computer vision, which uses computation to extract information from imagery, is an important aspect of autonomous driving with tasks ranging from low level, such as determining how far away a given location is from the vehicle, to higher level, such as determining if there is a pedestrian in the road.

Nathan Jacobs, professor of computer science & engineering in the McKelvey School of Engineering at Washington University in St. Louis, and a team of graduate students developed a joint learning framework to optimize two low-level tasks: stereo matching and optical flow. Stereo matching generates maps of disparities between two images and is a critical step in depth estimation for avoiding obstacles. Optical flow aims to estimate per-pixel motion between video frames and is useful to estimate how objects are moving as well as how the camera is moving relative to them.

Ultimately, stereo matching and optical flow both aim to understand the pixel-wise displacement of images and use that information to capture a scene’s depth and motion. Jacobs’ team’s co-training approach simultaneously addresses both tasks, leveraging their inherent similarities. The framework, which Jacobs presented Nov. 23 at the British Machine Vision Conference in Aberdeen, UK, outperforms comparable methods for completing stereo matching and optical flow estimation tasks in isolation. 

One of the big challenges in training models for these tasks is acquiring high-quality training data, which can be both difficult and costly, Jacobs said. The team’s method capitalizes on effective methods for image-to-image translation between computer-generated synthetic images and real image domains. This approach allows their model to excel in real-world scenarios while training solely on ground-truth information from synthetic images.

“Our approach overcomes one of the important challenges in optical flow and stereo, obtaining accurate ground truth,” Jacobs said. “Since we can obtain a lot of simulated training data, we get more accurate models than training only on the available labeled real-image datasets. More accurate stereo and optical flow estimates reduce errors that would otherwise propagate through the rest of the autonomous driving pipeline system, such as obstacle avoidance.” 

Xiong Z, Qiao F, Zhang Y, and Jacobs N. StereoFlowGAN: Co-training for stereo and flow with unsupervised domain adaptation. British Machine Vision Conference (BMVC), Nov. 20-24, 2023. DOI:

The McKelvey School of Engineering at Washington University in St. Louis promotes independent inquiry and education with an emphasis on scientific excellence, innovation and collaboration without boundaries. McKelvey Engineering has top-ranked research and graduate programs across departments, particularly in biomedical engineering, environmental engineering and computing, and has one of the most selective undergraduate programs in the country. With 165 full-time faculty, 1,420 undergraduate students, 1,614 graduate students and 21,000 living alumni, we are working to solve some of society’s greatest challenges; to prepare students to become leaders and innovate throughout their careers; and to be a catalyst of economic development for the St. Louis region and beyond.

Click on the topics below for more stories in those areas

Back to News