Program Analysis

I have four potential subprojects in the area of program analysis and one in proof analysis (though programs and proofs are extremely similar and are amenable to similar analysis techniques). Not all subprojects will go forward this summer. Please indicate which subprojects interest you (or feel free to give them a rank order), and we will go forward with projects that have the most interest!

Typically, program analysis deals with answering yes or no questions: does this program have certain behavior? Do two program components have a data dependency? Our research transforms these yes / no questions into quantitative questions: how many ways is a program behavior triggered? How much information is transferred between two program components? Answering these questions requires combining techniques from the study of programming languages with tools from discrete math, combinatorics, logic, or information theory. Some of this year's subprojects will also connect these issues to the human experience of interacting with programs and proofs. For more information, read on!  

Code Switching: Semantic Density and Programming Language Learnability

Programming languages vary in their verbosity. Some languages require writing a lot of code (think Java) and some languages require writing very little code (APL and J are extreme examples of terse languages). The amount of meaning that is encoded into the symbols of a program language determines the semantic density of that language. In this project we will borrow techniques from natural language processing and information theory to quantify the semantic density of a programming language’s syntax. We will design an experiment to compare semantic density with human performance on programming tasks in a custom domain specific language. 

Those interested primarily in human-centered programming language design with additional interests in natural language processing or information theory are encouraged to apply.

1 or 2 students will be supported for this project depending on interest.

Looping in the Human Mind: Time, Path, and Cognitive Complexity 

How complex is a piece of code? 

To answer that question, we first need to decide what the word “complexity” means! One option is time complexity, or a measure of the number of steps executed by the algorithm that the code implements. Another measure is called path complexity, which gives the number of different execution paths through a given piece of code. Yet a third measure is the cognitive complexity: the amount of mental effort required to understand some source code. In this project, we will examine how time complexity, path complexity, and cognitive complexity of code are related to one another. 

If you are interested in any or all of human learning, programming, algorithms, and combinatorics please apply! 

1 or 2 students will be supported for this project depending on interest.

Quantifying the Difficulty of Automatic Testing

Automatic testing techniques (like symbolic execution) are able to generate tests for programs. However, some programs are harder to automatically test than others. Perhaps one program has more complicated corner cases than another or has interesting behavior buried in deep loops. In this project, we will explore how program analysis and combinatorics can be combined and used to predict the difficulty of running an automatic test generator (most likely, KLEE) on a given program. 

Interests in program analysis tools and combinatorics is a win for this project. 

Looking for exactly 2 students to work on this.

Linear-Algebraic Parallel Symbolic Execution

Did you know that executing a program and performing matrix exponentiation are the same thing? There is a cool homomorphism!

Symbolic execution is a common program analysis technique used to automatically find bugs, generate tests, synthesize programs, and in many, many other applications. It is a very powerful tool. However, symbolic execution is notorious for being computationally expensive. Luckily, symbolic execution can be parallelized in a straightforward way to help speed things up. In this project, we will explore a novel way of parallelizing symbolic execution using the matrix-program homomorphism and parallel matrix multiplication algorithms. Theoretically, this should be faster than sequential symbolic execution and straightforward parallel symbolic execution: we’ll have implement it and do the experiments and find out for real!

If you are interested in learning about program analysis, abstract algebra, matrix algebra, or parallel programming, this project is for you!

Up to 3 students can work on this project. 

 

Programming Language Design for Mathematicians 


In this project, we will design and implement a programming language and tool for group theorists. The tool would be an interactive proof-checker for group theory. A person writes a proof in the language we design, and the tool checks whether each step of the proof is valid. Our goal is to build something that is easy to use for undergraduate students who are learning group theory.

Up to 2 students could work on this. 

Name of research group, project, or lab
ALPAQA Lab (Algorithms and Logic for Program And Quantitative Analyses)
Why join this research group or lab?

These projects all tackle the problem of knowing what programs will do. A challenging problem (uncomputable, in fact!). On top of that, we also want to know HOW MANY ways a program can do its thing.

Counting the ways that a program can do something requires analyzing the source code (this is called static analysis), analyzing the running program (this is called dynamic analysis) and performing combinatorial computations (via logic and a field called “model counting”) about the results. Doing so allows us to answer questions like “how safe is my code?”, “how many test inputs do I need in order to discover some interesting program behavior?”, or “how complicated is my code?”

Representative publication
Logistics Information:
Project categories
Computer Science
Mathematics
Student ranks applicable
First-year
Sophomore
Junior
Senior
Student qualifications

A successful applicant should have interest or background in any of

  • programming languages
  • discrete math
  • combinatorics
  • information theory
  • logic 
  • human-computer interaction
Time commitment
Summer - Full Time
Compensation
Paid Research
Number of openings
6
Techniques learned

Depending on the specific subproject, you will be exposed to 

  • human subject experiment design 
  • program analysis
  • combinatorics
  • abstract algebra
  • matrix algebra
  • parallel programming 
  • automated theorem proving
  • graph theory
  • information theory 
  • symbolic execution 
Contact Information:
Mentor name
Lucas Bang
Mentor email
lbang@g.hmc.edu
Mentor position
Computer Science Professor
Name of project director or principal investigator
Lucas Bang
Email address of project director or principal investigator
bang@cs.hmc.edu
6 sp. | 54 appl.
Hours per week
Summer - Full Time
Project categories
Mathematics (+1)
Computer ScienceMathematics