Randomized Sketching Methods for Constrained Regression Problems

This project investigates how randomized sketching techniques can be adapted to constrained least-squares problems, with a particular focus on active-set methods for nonnegative least squares (NLS). While randomized methods and sketch-and-project algorithms have dramatically improved the efficiency of unconstrained least-squares solvers, their extension to constrained settings remains poorly understood. Many important data analysis and machine learning pipelines rely on constrained least-squares subroutines, especially those involving nonnegativity or simplex constraints, yet existing randomized approaches often fail to preserve the combinatorial and geometric structure that active-set methods depend on.

The primary goal of the research is to understand how sketching affects the geometry of constrained least-squares problems and to develop theory and algorithms that allow sketching to be safely and effectively integrated into active-set methods. In particular, the project aims to identify conditions under which sketching preserves the active set (i.e., the support of the solution) and to characterize when and why sketched problems may lead to incorrect constraint identification. Ultimately, this work seeks to bridge the gap between randomized numerical linear algebra and classical combinatorial optimization methods.

The project combines theoretical analysis, algorithm design, and computational experimentation. Key tasks include:

  • Geometric analysis of sketching under constraints: Students will study how standard sketching guarantees (such as subspace embeddings) interact with feasible regions defined by nonnegativity and simplex constraints.
  • Active-set sensitivity analysis: We will analyze how perturbations induced by sketching affect constraint activation and deactivation in methods such as the Hanson–Lawson algorithm for NLS and Wolfe’s method for closest-point problems.
  • Algorithmic exploration: Students will implement baseline active-set methods and experiment with sketched variants, testing how different sketching operators impact convergence, accuracy, and support recovery.
  • Empirical validation: Computational experiments on synthetic and application-inspired datasets will be used to illustrate when sketching succeeds, when it fails, and how algorithmic safeguards might be designed.

Techniques drawn from randomized linear algebra, convex geometry, numerical optimization, and computational experimentation will be emphasized throughout.

 

Students will participate as active collaborators in both the theoretical and computational aspects of the project. Depending on background and interest, students may focus on:

  • Proving small theoretical results or counterexamples related to sketching and constraint preservation,
  • Implementing and testing algorithms in Python or MATLAB,
  • Designing experiments that probe the geometry of sketched constrained problems,
  • Reading and presenting foundational papers on sketching, active-set methods, and nonnegative least squares.

 

Students interested in applying should respond to the following prompts:

1. Research Interest and Motivation
What aspects of this research project interest you most? Are you more drawn to the theoretical, computational, or applied components of the work? Please explain briefly.

2. Research Experience
Describe any prior experience you have with mathematical, data science, or computer science research. This may include formal research projects, independent study, course projects, or self-directed exploration. (No prior research experience is required.)

3. Coursework and Technical Preparation
List and briefly describe any relevant coursework you have completed or are currently taking (e.g., linear algebra, probability, numerical analysis, optimization, machine learning, or scientific computing).

4. Conceptual or Reading Background (Optional)
Have you encountered topics such as least-squares regression, iterative methods for optimization, randomized algorithms, constrained optimization, or numerical linear algebra in coursework or independent reading? If so, briefly describe where and how.

Name of research group, project, or lab
matH of Algorithms, Data & Decisions (HADD) research group
Why join this research group or lab?

We consider the mathematics of data, models for making decisions with data, and methods for training such models.  Recent work in this group has focused upon iterative methods for problems suffering from adversarial corruption, randomized iterative methods for tensor decomposition and regression, and methods for learning on graphs and hypergraphs.  

This project would allow development of randomized methods, which are particularly efficient for large-scale data, in a new regime of problems where solutions must satisfy given constraints.

Representative publication
Logistics Information:
Project categories
Mathematics
Algorithms
Data Science
Optimization
Student ranks applicable
First-year
Sophomore
Junior
Senior
Student qualifications

Undergraduate students participating in this project should have a strong interest in applied mathematics, data science, machine learning, or optimization, along with the motivation to engage with both theoretical and computational aspects of research.

Specifically, students should have:

  • Mathematical background
    • Completion of coursework in linear algebra (including least-squares problems and matrix factorizations)
    • Familiarity with multivariable calculus and basic probability is preferred
    • Exposure to optimization, numerical analysis, or convex geometry is helpful but not required
  • Programming experience
    • Prior experience with Python, Julia, or a similar scientific computing language
    • Comfort implementing algorithms and running computational experiments
  • Research readiness
    • Ability to read and engage with mathematical or technical research papers with guidance
    • Willingness to learn new mathematical tools and algorithms independently
    • Strong analytical thinking and attention to detail
  • Collaboration and communication
    • Interest in collaborative research and regular meetings
    • Ability to clearly explain technical ideas in writing and discussion

Students with coursework or experience in machine learning, numerical linear algebra, optimization, or scientific computing are especially encouraged to apply. The project is well-suited for students who are curious about the interplay between geometry, algorithms, and randomness in modern data analysis.

Time commitment
Summer - Full Time
Compensation
Paid Research
Number of openings
2
Techniques learned

Students will gain experience with:

  • Least-squares and constrained optimization, including nonnegative least squares and active-set methods
  • Randomized numerical linear algebra, such as sketching, sampling, and sketch-and-project techniques
  • Iterative optimization algorithms and projection-based methods
  • Computational experimentation and algorithm implementation (Python or Julia)
  • Core research practices, including reading technical literature and communicating results
Project start
May 25, 2026
Contact Information:
Mentor
jhaddock@hmc.edu
Principal Investigator
Name of project director or principal investigator
Jamie Haddock
Email address of project director or principal investigator
jhaddock@g.hmc.edu
2 sp. | 0 appl.
Hours per week
Summer - Full Time
Project categories
Algorithms (+3)
MathematicsAlgorithmsData ScienceOptimization