In this work we will explore how deep convolutional networks focus on and represent information in both spatial and temporal domains. Deep convolutional networks are widely used in the field of computer vision, representing the state of the art approach to many problem domains. There are nevertheless a number of aspects to their behaviour which are unexplored or poorly understood, and we frequently find that the way that these models solve visual problems is quite different from (and more brittle than) human visual representations. By uncovering these differences, we hope to both better understand how deep networks function and identify possible avenues for improvement, as well as deepen our understanding of human visual cognition. In this project we will:
- Review existing literature in deep convolutional network behaviour, particularly in the domains of object detection, semantic segmentation, and action recognition.
- Work on adapting visualization techniques such as Grad-CAM to target networks of interest.
- Compare the behaviour of deep networks to human visual behaviour (with a focus on eye tracking)
- Stretch goal: Incorporate our findings into novel training or architecture designs to improve model behaviour.
Essay prompts (address the following):
1. What interests you in the project?
2. Do you have any prior experience with computer vision and/or deep learning?
3. What do you hope to get out of this research experience?
The Lab for CATS seeks to understand visual cognition, and help build more robust and unbiased artificial visual agents. There is a lot of hype in the world of computer vision and machine learning, and we seek to keep a grounded focus on fair and realistic evaluations of model behaviour with the goal of identifying when common benchmarking and evaluation practices might result in unanticipated deficits in novel or unconstrained environments.
We are a new organization; you will get to play an active part in laying the groundwork for a new research group!