Seminar
11:00 AM | PFT 3107
Bootstrapping Perception: Self-Supervised Visual Learning from Motion and Geometry
Abstract
Recent advancements in computer vision have predominantly relied on large-scale, human-curated datasets. What comes with their impressive outcomes is the model’s dependency on artificial data curation and human symbolic priors introduced through curation. Instead of shaping visual intelligence with human symbolic priors, I offer a perspective from the other direction, presenting a series of bottom-up mechanisms for developing machine vision. Central to this paradigm is bootstrapping, which leads to the emergence of higher-level semantic information from low-level visual cues such as motion and depth. This process closely mirrors the development of biological vision, where visual intelligence emerges through environmental interaction without external symbolic guidance. Symbols are attached to objects only after bootstrapping, making the paradigm purely datadriven, without human supervision or intervention. This learning approach eliminates the need for human supervision or artificial data curation, paving the way for more scalable training on diverse data.
Dong Lao
University of California, Los Angeles