Natural Language Proof Checking

Errors in software become more expensive and dangerous each year. Testing is helpful, but proofs are the only way to truly guarantee that an algorithm is correct or that a cryptographic protocol is secure.

Unfortunately, it’s hard to get all the details of a big proof right; even careful reasoners make mistakes, and even careful readers miss them. Automation seems a natural solution, and some parts of CS make increasing use of “proof assistant” systems that help people construct and verify proofs, using special-purpose scripting languages. However, these produce proofs that are certainly correct, but that are almost unreadable if you’re not a computer!

This can be inconvenient. For example, the open-source textbook Homotopy Type Theory had all its main theorems computer-checked before the book was published in 2015, so one would expect it to be reliable. Nevertheless, seven years later people still are reporting typos or errors in the proofs. The theorems are true, but the proofs in the book (written to be clear to humans) are totally unrelated to the unreadable formalized proofs that were verified by the computer.

I am interested in systems that can verify proofs that are written for humans to read and understand, the kind of proofs found in textbooks and research papers. Essentially, I'd like to go beyond spell-checking and grammar-checking to logic-checking.

Depending on the student's interests and backgrounds, the summer work could take one of several routes, including:

  1. I have collected the LaTeX source for 1.7M papers submitted to, and written scripts to extract 2.7 million proofs from these papers. There are lots of existing Natural Language data sets with examples of things like tweets or newspaper articles, but few (or no) data sets for the language of mathematics and proofs.  I've done a fair amount of cleanup in the extracted proofs, but with a bit more polishing this could be a nice data set to make public.
  2. One of the things that people like to do with big data sets is to train language models and machine learning algorithms. I have briefly toyed with this (e.g., here) but there is certainly more that could be done if we started labeling the arXiv data, e.g., building classifiers that distinguish sentences that introduce assumptions from sentences that draw conclusions from previous statements, or classifiers that identify specific proof strategies like proof-by-cases or proof-by-contradiction.
  3. A completely different approach would be to attack the proof-checking problem directly. Could we build a small prototype that parses proofs a predefined fragment of English and can verify the results (e.g., with an off-the-shelf theorem prover like Z3)?  While there are some systems like Mizar and Naproche that attempt to do this, they handle only a very limited syntax; we could target a more flexible input language, consulting the arXiv data set to see what words and phrases are most important to support.
Name of research group, project, or lab
Prof. Stone
Logistics Information:
Project categories
Computer Science
Natural Language Processing
Student ranks applicable
Student qualifications

Candidates must be comfortable writing nontrivial amounts of code (e.g., in Python). Prior experience with proofs (e.g., CS 81 or Math 131), NLP, and/or ML would be helpful, but is not required.

Time commitment
Summer - Full Time
Paid Research
Number of openings
Contact Information:
Mentor name
Christopher Stone
Mentor email
Mentor position
Name of project director or principal investigator
Chris Stone
Email address of project director or principal investigator
1 sp. | 20 appl.
Hours per week
Summer - Full Time
Project categories
Mathematics (+2)
Computer ScienceMathematicsNatural Language Processing