Supporting Text Mining with Topic Models for All

The aim of this project is the development of workflows, visualizations, and interface to better help text experts leverage topic models, a type of unsupervised machine learning model, to study texts they're interested in. This 1- to 1.5-credit independent study will follow up on existing research work that aims to understand how non-computer-science experts use topic models in their research as well as support their research. The student will analyze interview transcripts associated with previous work, and use their conclusions to continue development on an open-source web-based tool for topic model interaction. The goal for this semester is to complete work started in summer 2021 including web development for data visualization, the analysis of machine learning models and transcripts generated in user studies, as well as work on a drafted academic paper and blog post describing this work. Our web application is built with TypeScript and React, so experience with JavaScript web development and concepts of machine learning or natural language processing are a plus.

Name of research group, project, or lab
WHISK Lab (Prof. Xanda)
Why join this research group or lab?

Since this project (tsLDA) is the continuation of a historical software project called jsLDA with a significant user base, we're excited that this tool will eventually reach a wide variety of different text-knowledgeable individuals, including scholars in the humanities and social sciences. Working on this project should also give you the opportunity to explore texts you're interested in as a way to test this tool, as well as to get some insight into the real-world challenges presented by applying machine learning to "messy" data.

You can see a demo version for our user study here: https://www.cs.hmc.edu/~xanda/jsLDA/?id=testviewer

Logistics Information:
Project categories
Computer Science
Machine Learning
Natural Language Processing
Student ranks applicable
Sophomore
Junior
Senior
Student qualifications

Required:
- Some course on probability
- At least CS 60 (or equivalent)
 

Optional but worth mentioning:
- Past experience with machine learning, natural language processing, and/or data science (including coursework, projects, internships, etc.)
- Past experience with JavaScript or TypeScript, especially React

Time commitment
Fall - Part Time
Compensation
Academic Credit
Number of openings
1
Techniques learned

Topics you can expect to learn more about include
- Probabilistic machine learning models
- Text processing for data mining and analysis
- Web development
- Human-centered design processes
- Scholarly writing

Contact Information:
Mentor name
Xanda Schofield
Mentor email
aschofield@hmc.edu
Mentor position
Assistant Professor
Name of project director or principal investigator
Xanda Schofield
Email address of project director or principal investigator
xanda@cs.hmc.edu
1 sp. | 6 appl.
Hours per week
Fall - Part Time
Project categories
Computer Science (+2)
Computer ScienceMachine LearningNatural Language Processing