Probabilistic topic models are widely used outside of computer science to find patterns of meaning in large text collections. However, like a lot of other natural language processing tools, their effectiveness is often a function of choices about data preparation and model configuration. Making these decisions often requires iterating through many options, which is is particularly hard if you have limited machine learning or natural language processing background: from interviews our team conducted last summer, we found that it could take researchers many months to successfully reach the point of training a usable model. Our research question is this: how can we design a tool that supports iterative refinement of topic models for a user base with limited programming experience but deep textual questions?
This summer, we will build on work from last summer to develop jsLDA 2.0, a revision of a small web-based topic modeling interface that streamlines common “loops” and workflows described by these users. We will then perform user studies with both novices and experts to further develop this tool, alongside an accompanying tutorial suitable for digital humanities and computational social science classrooms. Students working on this project will practice skills of data processing for text, visualization, web development, and user study design.
For more information about our project from last summer, check out our short video here.
This project is extra-exciting in that it's very "meta": not only do we talk about machine learning, user interfaces, and social science, we also talk about how we talk about those things to different audiences. Working on this project will also give the opportunity to meet people doing research that combines computing and culture and to practice a type of computational design that centers human inquiry. In the process, you'll get the chance to use the tool you build to study texts you're excited about, too!