Storage Validation and Performance Enhancement

The storage system in your computer is its slowest component, and also the one that must be the most reliable (unless you like losing your files!). In collaboration with Stony Brook University, we are working on ways to speed up storage and validate its reliability. Each of these tasks is a separate sub-project under the same overarching supervision; we will be assigning students to specific projects when summer arrives.

Four students will travel to Stony Brook University on Long Island, NY (a 2-hour train ride from New York City) with Prof. Erez Zadok of Stony Brook and his graduate students; Prof. Kuenning will co-supervise from HMC. In all cases we will be developing and enhancing software, running experiments, and measuring results. Students will work on some of the following projects:

  • We are using the SPIN model checker to find bugs in file systems. This approach could lead to major improvements in the reliability of popular file systems, but it turns out to be unusually challenging to get SPIN to run at high speed. The reason is that SPIN regularly saves the state of the computation it is checking and later rolls back to that state, but the internal state of a file system is complex and is not directly available to SPIN. We are developing new approaches to solve this problem so that we can run checks efficiently. So far we have found two bugs in a toy file system but are still improving how our approach works with real-life ones.
  • Most large Web sites use several levels of software caches to improve performance; designing the various levels to do the best job at the lowest costs is a hard (probably NP-hard) problem. To find good designs, we study "Miss Ratio Curves" (MRCs), which can quickly characterize how the behavior of a single-level cache varies depending on its size. Generating a full set of MRCs for a multi-level cache is infeasible, so we have developed algorithms that find "knees" in the curve for a given level that indicate places where it is worthwhile to do experiments on the next level down. Using those knees, we are running experiments on real hardware to validate our results. This work is in cooperation with HMC alum Avani Wildani of Emory University.
  • In a related project, we are developing visualization tools that will help system designers understand the effects of various parameters on the performance of multi-tier caching systems. Most storage systems have tens to hundreds of tunable parameters that interact in unpredictable ways. We are creating a novel visualization tool that will make it easy to explore the effects of parameters and identify different kinds of interactions between large groups. Some examples include simple correlation, inverse correlation, and clusters, among others. And although we are focused on storage systems, the tool is applicable to a wide range of other fields. We are currently running user tests on the tool and continuing to enhance it based on user feedback.
  • Another way to improve performance is to predict what applications are going to do in the near future and then prepare the storage system in advance, for example by "pre-fetching" data from storage so that it will be available when the application requests it. The Kernel Machine Learning (KML) project uses machine learning inside the operating system to try to optimize the storage system to respond to the applications that are running at the moment. We are now doing experiments to characterize the KML system and find ways to improve it further.
  • Most Internet standards are defined by "Requests for Comments" (RFCs), which are precise specifications written in a rigorous style. Because the writing style and the topic are are constrained, it is easier for a machine to parse and understand. We are developing an NLP system that will read the specification for version 4 of the Network File System standard (NFSv4, RFC 7530
Name of research group, project, or lab
File Systems Lab (Stony Brook University)
Why join this research group or lab?

Professors Kuenning and Zadok have been collaborating for over a decade.  Prof. Zadok's research group at Stony Brook is one of the most productive and respected storage research groups in the world, with dozens of publications in top venues.  This project will give you a chance to work with Ph.D. students on cutting-edge, graduate-level research projects that will have real-world impact on the performance and reliability of storage systems.

Representative publication
Logistics Information:
Project categories
Computer Science
Student ranks applicable
Sophomore
Junior
Senior
Student qualifications

CS 70 is required; CS 105 is a plus.

 
Time commitment
Summer - Full Time
Compensation
Paid Research
Number of openings
4
Techniques learned

Students will learn how file systems work, how they interact with operating systems and with storage devices, and how to measure and evaluate computer systems.  Some projects will also involve working with git and Github, and with the open-source community.  Others will use Linux shell scripting and various analysis tools.

Project start
Monday, May 20, 2024 (tentative)
Contact Information:
Mentor
Geoff Kuenning
kuenning@hmc.edu
Principal Investigator
Name of project director or principal investigator
Geoff Kuenning
Email address of project director or principal investigator
geoff@cs.hmc.edu
4 sp. | 24 appl.
Hours per week
Summer - Full Time
Project categories
Computer Science