Randomized Computation of Tensor Decompositions

Modern data analysis is often challenging not only due to the size of the data, but due to the inherent complexity of the data.  Often this modern data is multi-modal, with modes representing measurements along different dimensions, and naturally represented as a tensor.  One of the fundamental tasks in tensor data analysis is computing a tensor decomposition, which produces a lower-dimensional approximation to the tensor.  It has been previously noted that many methods for computing tensor decompositions repeatedly solve a large-scale linear regression as a subroutine.

Solving large-scale linear regressions is one of the most commonly encountered problems across the data-rich sciences.  This problem arises in machine learning as subroutines of several optimization methods, in medical imaging, in sensor networks, and in statistical analysis, to name only a few. In the matrix-vector and matrix-matrix regime, this problem is very well-understood with many highly efficient methods with provable guarantees in the literature.  For example, the Kaczmarz method is an iterative algorithm used for solving linear systems of equations that can be viewed as a special case of SGD. Given a system of equations Ax=b, the Kaczmarz method aims to find a solution x that satisfies the equations.  The algorithm iteratively updates the solution by selecting one equation at a time and projecting the current solution onto the hyperplane defined by that equation.

 In order to guarantee efficient computation of a tensor decomposition, we must be able to efficiently solve the linear regression subroutines.  We will understand the structure of these subroutine linear regression problems and will use this understanding to guarantee they may be efficiently solved by the Kaczmarz method.

Student researchers in this project will develop mathematical skills in areas like (multi-)linear algebra, optimization, probability, and statistics.  They will build and strengthen skills in literature review, scientific reading, technical writing and presentation, and programming and package development.

-----------------------------------------------------------------------------------------------------------------------------------------------

Interested applicants should submit a CV, the name of a reference, and answers to the following prompts:

  • Why are you interested in this project?  What makes you a good fit?
  • What skills do you bring to this project?  What skills do you hope to develop?
  • What are your goal outcomes for your summer research project?
Name of research group, project, or lab
matH of Algorithms, Data & Decisions (HADD) research group
Why join this research group or lab?

In the matH of Algorithms, Data & Decisions (HADD) research group, we consider problems motivated by the study of real-world data. We consider the mathematics of data, models for making decisions with data, and methods for training such models.  We consist of fun and passionate people who encourage one another and help each other develop as mathematicians, data scientists, and researchers.  Our group has students working on various projects, and often interacts with collaborators from other institutions (graduate students, postdoctoral researchers, and faculty).  Students may continue their work in a senior thesis, present their findings at conferences, and/or coauthor resulting publications.

We work in areas like mathematical data science, optimization, and applied convex geometry, leveraging mathematical tools like probability, combinatorics, and convex geometry, on problems in data science and optimization. Our group has been active recently in randomized numerical linear algebra, combinatorial methods for convex optimization, tensor decomposition for topic modeling, network consensus and ranking problems, and community detection on graphs and hypergraphs.

Representative publication
Logistics Information:
Project categories
Mathematics
Algorithms
Data Science
Optimization
Student ranks applicable
Sophomore
Junior
Senior
Student qualifications

Required:

  • completion of core courses in calculus and linear algebra
  • coursework in probability and/or statistics
  • programming experience (Python and/or Matlab preferred)

 

Desirable but not required:

  • optimization
  • data science 
  • machine learning
Time commitment
Summer - Full Time
Compensation
Paid Research
Number of openings
2
Techniques learned

Students will build and strengthen skills in:

  • (multi-)linear algebra
  • optimization
  • machine learning
  • probability and statistics
  • literature review
  • scientific reading
  • technical writing and presentation
  • programming and package development
Contact Information:
Mentor
Jamie Haddock
jhaddock@hmc.edu
Principal Investigator
Name of project director or principal investigator
Jamie Haddock
Email address of project director or principal investigator
jhaddock@g.hmc.edu
2 sp. | 23 appl.
Hours per week
Summer - Full Time
Project categories
Optimization (+3)
MathematicsAlgorithmsData ScienceOptimization