Randomized Sketching Methods for Constrained Regression Problems
This project investigates how randomized sketching techniques can be adapted to constrained least-squares problems, with a particular focus on active-set methods for nonnegative least squares (NLS). While randomized methods and sketch-and-project algorithms have dramatically improved the efficiency of unconstrained least-squares solvers, their extension to constrained settings remains poorly understood. Many important data analysis and machine learning pipelines rely on constrained least-squares subroutines, especially those involving nonnegativity or simplex constraints, yet existing randomized approaches often fail to preserve the combinatorial and geometric structure that active-set methods depend on.
The primary goal of the research is to understand how sketching affects the geometry of constrained least-squares problems and to develop theory and algorithms that allow sketching to be safely and effectively integrated into active-set methods. In particular, the project aims to identify conditions under which sketching preserves the active set (i.e., the support of the solution) and to characterize when and why sketched problems may lead to incorrect constraint identification. Ultimately, this work seeks to bridge the gap between randomized numerical linear algebra and classical combinatorial optimization methods.
The project combines theoretical analysis, algorithm design, and computational experimentation. Key tasks include:
- Geometric analysis of sketching under constraints: Students will study how standard sketching guarantees (such as subspace embeddings) interact with feasible regions defined by nonnegativity and simplex constraints.
- Active-set sensitivity analysis: We will analyze how perturbations induced by sketching affect constraint activation and deactivation in methods such as the Hanson–Lawson algorithm for NLS and Wolfe’s method for closest-point problems.
- Algorithmic exploration: Students will implement baseline active-set methods and experiment with sketched variants, testing how different sketching operators impact convergence, accuracy, and support recovery.
- Empirical validation: Computational experiments on synthetic and application-inspired datasets will be used to illustrate when sketching succeeds, when it fails, and how algorithmic safeguards might be designed.
Techniques drawn from randomized linear algebra, convex geometry, numerical optimization, and computational experimentation will be emphasized throughout.
Students will participate as active collaborators in both the theoretical and computational aspects of the project. Depending on background and interest, students may focus on:
- Proving small theoretical results or counterexamples related to sketching and constraint preservation,
- Implementing and testing algorithms in Python or MATLAB,
- Designing experiments that probe the geometry of sketched constrained problems,
- Reading and presenting foundational papers on sketching, active-set methods, and nonnegative least squares.
Students interested in applying should respond to the following prompts:
1. Research Interest and Motivation
What aspects of this research project interest you most? Are you more drawn to the theoretical, computational, or applied components of the work? Please explain briefly.
2. Research Experience
Describe any prior experience you have with mathematical, data science, or computer science research. This may include formal research projects, independent study, course projects, or self-directed exploration. (No prior research experience is required.)
3. Coursework and Technical Preparation
List and briefly describe any relevant coursework you have completed or are currently taking (e.g., linear algebra, probability, numerical analysis, optimization, machine learning, or scientific computing).
4. Conceptual or Reading Background (Optional)
Have you encountered topics such as least-squares regression, iterative methods for optimization, randomized algorithms, constrained optimization, or numerical linear algebra in coursework or independent reading? If so, briefly describe where and how.
We consider the mathematics of data, models for making decisions with data, and methods for training such models. Recent work in this group has focused upon iterative methods for problems suffering from adversarial corruption, randomized iterative methods for tensor decomposition and regression, and methods for learning on graphs and hypergraphs.
This project would allow development of randomized methods, which are particularly efficient for large-scale data, in a new regime of problems where solutions must satisfy given constraints.