Seminar

11:00 AM  |  PFT 3107


Bootstrapping Perception: Self-Supervised Visual Learning from Motion and Geometry

Abstract

Recent advancements in computer vision have predominantly relied on large-scale, human-curated datasets. What comes with their impressive outcomes is the model’s dependency on artificial data curation and human symbolic priors introduced through curation. Instead of shaping visual intelligence with human symbolic priors, I offer a perspective from the other direction, presenting a series of bottom-up mechanisms for developing machine vision. Central to this paradigm is bootstrapping, which leads to the emergence of higher-level semantic information from low-level visual cues such as motion and depth. This process closely mirrors the development of biological vision, where visual intelligence emerges through environmental interaction without external symbolic guidance. Symbols are attached to objects only after bootstrapping, making the paradigm purely datadriven, without human supervision or intervention. This learning approach eliminates the need for human supervision or artificial data curation, paving the way for more scalable training on diverse data.

 

Dong Lao

Dong Lao
University of California, Los Angeles