Randomized Computation of Tensor Decompositions
Modern data analysis is often challenging not only due to the size of the data, but due to the inherent complexity of the data. Often this modern data is multi-modal, with modes representing measurements along different dimensions, and naturally represented as a tensor. One of the fundamental tasks in tensor data analysis is computing a tensor decomposition, which produces a lower-dimensional approximation to the tensor. It has been previously noted that many methods for computing tensor decompositions repeatedly solve a large-scale linear regression as a subroutine.
Solving large-scale linear regressions is one of the most commonly encountered problems across the data-rich sciences. This problem arises in machine learning as subroutines of several optimization methods, in medical imaging, in sensor networks, and in statistical analysis, to name only a few. In the matrix-vector and matrix-matrix regime, this problem is very well-understood with many highly efficient methods with provable guarantees in the literature. For example, the Kaczmarz method is an iterative algorithm used for solving linear systems of equations that can be viewed as a special case of SGD. Given a system of equations Ax=b, the Kaczmarz method aims to find a solution x that satisfies the equations. The algorithm iteratively updates the solution by selecting one equation at a time and projecting the current solution onto the hyperplane defined by that equation.
In order to guarantee efficient computation of a tensor decomposition, we must be able to efficiently solve the linear regression subroutines. We will understand the structure of these subroutine linear regression problems and will use this understanding to guarantee they may be efficiently solved by the Kaczmarz method.
Student researchers in this project will develop mathematical skills in areas like (multi-)linear algebra, optimization, probability, and statistics. They will build and strengthen skills in literature review, scientific reading, technical writing and presentation, and programming and package development.
-----------------------------------------------------------------------------------------------------------------------------------------------
Interested applicants should submit a CV, the name of a reference, and answers to the following prompts:
- Why are you interested in this project? What makes you a good fit?
- What skills do you bring to this project? What skills do you hope to develop?
- What are your goal outcomes for your summer research project?
In the matH of Algorithms, Data & Decisions (HADD) research group, we consider problems motivated by the study of real-world data. We consider the mathematics of data, models for making decisions with data, and methods for training such models. We consist of fun and passionate people who encourage one another and help each other develop as mathematicians, data scientists, and researchers. Our group has students working on various projects, and often interacts with collaborators from other institutions (graduate students, postdoctoral researchers, and faculty). Students may continue their work in a senior thesis, present their findings at conferences, and/or coauthor resulting publications.
We work in areas like mathematical data science, optimization, and applied convex geometry, leveraging mathematical tools like probability, combinatorics, and convex geometry, on problems in data science and optimization. Our group has been active recently in randomized numerical linear algebra, combinatorial methods for convex optimization, tensor decomposition for topic modeling, network consensus and ranking problems, and community detection on graphs and hypergraphs.