Storage Measurement, Performance, and Validation

The storage system in your computer is its slowest component, and also the one that must be the most reliable (unless you like losing your files!). In collaboration with Stony Brook University, we are working on ways to speed up storage, measure its performance, and validate its reliability. Each of these tasks is a separate sub-project under the same overarching supervision; we will be assigning students to specific projects when summer arrives.

Two students will work with Prof. Erez Zadok of Stony Brook and his graduate students; two others will work directly with Prof. Kuenning at HMC. In all cases we will be developing and enhancing software, running experiments, and measuring results. Some sample projects include:

  • Re-Animator is a system that allows us to record the behavior of an application and later replay that record to reproduce the behavior. The system exists, but needs to be enhanced to better handle parallel applications. In addition, we will be using Re-Animator to investigate the behavior of storage systems. Finally, Re-Animator is being released to the public as open-source software, so we need to ensure that it is consistently bulletproof and easy to use.
  • We are in the middle stages of a project to improve the design of multi-tier caching systems. Most large Web sites use several levels of software caches to improve performance; designing the various levels to do the best job at the lowest costs is a hard (probably NP-hard) problem. We have developed promising algorithms that quickly generate performance curves and then find "knees" in the curves that indicate places where performance can be dramatically improved for a relatively small extra cost; studying these knees in more detail can then lead to an optimal solution. We are continuing to work on these algorithms and are running experiments to measure their effectiveness.
  • In a new project, we are using the SPIN model checker to find bugs in file systems. This approach promises to have a major impact on the reliability of popular file systems, but it turns out to be unusually challenging to get SPIN to run at high speed. The reason is that SPIN regularly saves the state of the computation it is checking and later rolls back to that state, but the internal state of a file system is complex and is not directly available to SPIN. We are developing new approaches to solve this problem so that we can run checks efficiently. So far we have found two bugs in a toy file system but are still improving how our approach works with real-life ones.
Name of research group, project, or lab
File Systems Laboratory (Stony Brook); Prof. Kuenning's Lab
Why join this research group or lab?

Professors Kuenning and Zadok have been collaborating for over a decade.  Prof. Zadok's research group at Stony Brook is one of the most productive and respected storage research groups in the world, with dozens of publications in top venues.  This project give you a chance to work alongside graduate students on cutting-edge projects that will have real-world impact on the performance and reliability of storage systems.

Logistics Information:
Project categories
Computer Science
Student ranks applicable
Sophomore
Junior
Senior
Student qualifications

CS 70 is required; CS 105 is a plus.

Time commitment
Summer - Full Time
Compensation
Paid Research
Number of openings
4
Techniques learned

Students will learn how file systems work, how they interact with operating systems and with storage devices, and how to measure and evaluate computer systems.  Some projects will also involve working with git and Github, and with the open-source community.

Contact Information:
Mentor name
Geoff Kuenning
Mentor email
geoff@cs.hmc.edu
Mentor position
Prinicpal Investigator
Name of project director or principal investigator
Geoff Kuenning
Email address of project director or principal investigator
geoff@cs.hmc.edu
4 sp. | 15 appl.
Hours per week
Summer - Full Time
Project categories
Computer Science