Computer and Human Vision Research and Development

Here in the Laboratory for Cognition and Attention in Time and Space (Lab for CATS), we study visual perception with a focus on artificial visual agents, but often draw inspiration from and compare to human vision. There are three concurrent opportunities through which students could apply to join the lab and receive course credit (CS186):

SMILER Development: The Saliency Model Implementation Library for Experimental Research (SMILER) is a comprehensive tool which wraps saliency models (computer vision models designed to predict what parts of an image are most likely to be attended to by a human observer) into a common API. It is built using Python and MATLAB, and uses docker to support models which have been implemented in a variety of additional languages and formats. Since its release in 2018, SMILER has been used in a variety of projects, including studies of zebra stripes, aerial tree detection, and saliency model benchmarking over novel datasets. In Spring 2022, I worked with a team of HMC students to update and overhaul SMILER, and together we managed a number of important updates to the codebase. However, there is more development to do! Working on this project, you can expect to tackle one or more of the following tasks:

  1. Saliency model dockerization: SMILER works by encapsulating each non-MATLAB saliency model in the library in a docker container, thereby preventing library conflicts between different models. You would work on either adding new models to the SMILER library by building new docker containers for them, or you would work on fixing any models that are currently not working in all SMILER environments.
  2. Cross-platform and CPU extensions: SMILER was originally developed for a Linux environment with the expectation of an NVIDIA GPU to power the deep learning models. In Spring 2022 we introduced support for Windows and MacOS, as well as for CPU-only processing, but this functionality could use further testing and development.
  3. Issue tracking: SMILER is an open source project, and has an ongoing list of issues logged on its GitHub page. Some of these are related to the tasks above, but others are miscellaneous tasks that nevertheless are important to tackle.

Useful Skills: Experience with git, docker, and Python library management are all major assets for this project. Having taken CS70 and/or CS105 will be highly beneficial.

Saliency, Eye-Tracking, and Psychophysics: There are several ongoing studies in collaboration with Prof. Breeden, and one is in collaboration with Dr. Kavassalis' FICUS Lab (you can see a more detailed version of this specific project through the "Flexible, interdisciplinary computing for understanding the atmosphere" post; you may be involved in this project through applying either for the FICUS Lab or the Lab for CATS).

Working on this set of projects, you can expect to learn about human eye-tracking (both the practical aspect of using an eye-tracker and gathering eye-tracking data, as well as some of the more foundational aspects of how we interpret and process fixation data). Additionally, depending on which project you focus on, you will gain further experience with experiment design and data analysis in one of three areas:

  1. Scientific Figure Interpretation: Many areas of science have broad impacts on society, but communicating the findings of scientific research to the people in a position for setting public policy or interpreting facts in a legal framework can often be rather challenging. Often, scientists will condense a great deal of information into a visual figure. However, these figures will sometimes be designed with a level of assumed knowledge or familiarity with data visualization conventions that is not shared by the target audience. Likewise, previous research has shown that due to time constraints, policy makers will typically only look at a relevant figure for a very brief period of time (~30s), so the salient points attempting to be communicated in a figure need to be highly obvious. This project is focused on one such area of scientific research with a high need for public communication (climate science). Previously, eye tracking has been proposed as a method that could be used to test and improve the quality of scientific figures for public communication, but eye tracking is not a particularly scalable solution (eye trackers themselves tend to be expensive pieces of specialized equipment, and require a trained operator to set up and run). This project, therefore, is focused on seeing if we can identify and develop tools to aid in the evaluation and improvement of scientific figures. This project is in the early stages, so will largely revolve around data gathering through eye tracking as well as a click-contingent display software (called BubbleView). Later stages will include also running saliency models and working on developing comparison metrics between the saliency output and the human data that has been gathered.
  2. Asymmetric Visual Search: Visual search is an extremely common psychophysical paradigm through which scientists learn about the human visual system. One attribute of human vision is something known as asymmetric visual search. This occurs when a human subject is more efficient at finding a unique target A amongst distractors B than when they are searching for a unique target B amongst distractors A (for example, humans are typically faster at finding a magenta dot amongst blue distractors than they are at finding a blue dot amongst magenta distractors). Previous evidence has shown that saliency models fail to replicate these patterns for a particular class of expected search asymmetries based on target novelty; in this study we will test human subjects on these asymmetries and compare their behaviour to saliency model predictions.
  3. Video Data Handling: This project works to extend a dataset originally gathered by Prof. Breeden, in which she built a database of eye tracking sequences over movie segments several minutes in length. There are two stages to this project: first, extracting sub-sequences of the existent movie clips corresponding to a constrained set of short actions (such as jumping). Second, we will gather new eye tracking data captured only over these action snippets, allowing us to compare the similarity in eye tracking between when the viewer has greater surrounding context from the movie versus just the short action sequence.

Useful Skills: Much of this project will involve learning new tools, so it is not expected that anyone applying will necessarily have experience with the relevant tools. However, experience with MATLAB could be an asset (but is not essential). If you are interested in video data handling, experience with video handling libraries like ffmpeg could be useful.

Computer Vision Demos and Exercise Development: Working on this project would entail building a software project to demonstrate a specific topic in computer vision, with the goal of incorporating the code as either an exercise or demo in future offerings of CS153: Computer Vision. Students working on this project would focus on a single area and gain in-depth experience with that aspect of computer vision, as well as gain experience in course design and development. Topics include:

  • Image plane geometry
  • Keypoint detection and interpretation
  • Convolutional Neural Networks
  • Autoencoders or Variational Autoencoders
  • Stereoscopic Depth

Useful Skills: Having already completed CS153 would be a major asset. Depending on the specific exercise you would like to develop, experience either with image processing, visual geometry, or deep learning would be an asset.

Required Essay Prompt:

Please let me know:

  1. Which project(s) are you interested in working on.
  2. For each project mentioned in (1.), write two to three sentences on why you are interested in that specific project (e.g. personal interest or alignment with future career goals).
  3. For each project mentioned in (1.), please also provide a few sentences describing any relevant experience you have had that will assist with getting started on that project (this can be relevant course work, past research or professional experience, or other things you think are relevant).

 

Name of research group, project, or lab
Lab for CATS
Logistics Information:
Project categories
Computer Science
Artificial Intelligence
Human-Computer Interaction
Machine Learning
Teaching & Learning
Student ranks applicable
Sophomore
Junior
Senior
Time commitment
Fall - Part Time
Compensation
Academic Credit
Number of openings
6
Contact Information:
Mentor
Calden Wloka
cwloka@hmc.edu
Principal Investigator
Name of project director or principal investigator
Calden Wloka
Email address of project director or principal investigator
cwloka@hmc.edu
6 sp. | 12 appl.
Hours per week
Fall - Part Time
Project categories
Human-Computer Interaction (+4)
Computer ScienceArtificial IntelligenceHuman-Computer InteractionMachine LearningTeaching & Learning