The storage system in your computer is its slowest component, and also the one that must be the most reliable (unless you like losing your files!). In collaboration with Stony Brook University, we are working on ways to speed up storage, measure its performance, and validate its reliability. Each of these tasks is a separate sub-project under the same overarching supervision; we will be assigning students to specific projects when summer arrives.
Two students will work with Prof. Erez Zadok of Stony Brook and his graduate students; two others will work directly with Prof. Kuenning at HMC. In all cases we will be developing and enhancing software, running experiments, and measuring results. Students will work on some of the following projects:
- Re-Animator is a system that allows us to record the behavior of an application and later replay that record to reproduce the behavior. We have released it to the public as open-source software, but it needs further enhancement in various ways. We are also using Re-Animator to investigate the behavior of storage systems; we have begun running experiments and analyzing the results to try to understand the phenomenon of "tail latency", which causes some requests to take 10-50 times as long as the average.
- Most large Web sites use several levels of software caches to improve performance; designing the various levels to do the best job at the lowest costs is a hard (probably NP-hard) problem. To find good designs, we study "Miss Ratio Curves" (MRCs), which can quickly characterize how the behavior of a single-level cache varies depending on its size. Generating a full set of MRCs for a multi-level cache is infeasible, so we have developed promising algorithms that find "knees" in the curve for a given level that indicate places where it is worthwhile to do experiments on the next level down. At the moment we are trying to develop metrics that will characterize where our approach is succeeding and where it needs improvement. That work involves both mathematical analysis and running simulations. We are also running live experiments on real hardware, including state-of-the-art phase-change memory that is not yet in wide use.
- In a related project, we are developing visualization tools that will help system designers understand the effects of various parameters on the performance of multi-tier caching systems. Most storage systems have tens to hundreds of tunable parameters that interact in unpredictable ways. We are creating a novel visualization tool that will make it easy to explore the effects of parameters and identify different kinds of interactions between large groups. Some examples include simple correlation, inverse correlation, and clusters, among others.
- We are using the SPIN model checker to find bugs in file systems. This approach could lead to major improvements in the reliability of popular file systems, but it turns out to be unusually challenging to get SPIN to run at high speed. The reason is that SPIN regularly saves the state of the computation it is checking and later rolls back to that state, but the internal state of a file system is complex and is not directly available to SPIN. We are developing new approaches to solve this problem so that we can run checks efficiently. So far we have found two bugs in a toy file system but are still improving how our approach works with real-life ones.
- Most Internet standards are defined by "Requests for Comments" (RFCs), which are precise specifications written in a rigorous style. Because the writing style and the topic are are constrained, it is easier for a machine to parse and understand. We are developing an NLP system that will read the specification for version 4 of the Network File System standard (NFSv4) and use that specification to automatically validate a particular implementation of NFSv4. At the moment we are using machine-generated logic propositions to create assertion statements that can be inserted into the code; eventually we hope to integrate the propositional logic into a source-level analysis system so that a full implementation can be validated for accuracy. As a side effect, we have already uncovered some ambiguities and errors in the existing specification.
Professors Kuenning and Zadok have been collaborating for over a decade. Prof. Zadok's research group at Stony Brook is one of the most productive and respected storage research groups in the world, with dozens of publications in top venues. This project will give you a chance to work alongside graduate students on cutting-edge projects that will have real-world impact on the performance and reliability of storage systems.