EconText: Text Mining for Economic Understanding

Economic activities produces lots of unstructured and semi-structured text, ranging from formal financial reports to news stories and social media posts. The EconText lab is interested in projects that explore how to use bodies of text to understand macroeconomic conditions and how they're changing in ways that could help inform policies for the future. We combine classic macroeconomic analysis with natural language processing tools to help connect qualitative themes in discussion with quantitative phenomena. Our group is led by three professors: Prof. Manisha Goel and Prof. Michelle Zemel from Pomona's Economics department, and Prof. Xanda Schofield from HMC's CS department

We are interested in finding students who will be invested in working with a team of other students across Pomona and HMC to review related work, collect and process text, and build data analyses and visualizations in service of answering economics-driven research questions.

Some projects in the works are:

  • Understanding disruption caused by the 2016 Indian demonetization: In November 2016, India's government unexpectedly declared 86% of currency bills in circulation no longer legal tender. A cash-dependent economy suddenly didn't have cash! How did companies in India navigate this disruption? We use firm financial data, annual reports, and transcripts from earnings calls to examine these questions.
  • How data is changing the labor market: There's a visible push towards inserting automation into processes from managing complex factories to checkouts at a grocery store. However, news stories about the advances of a few high-tech companies don't tell the whole story. We study job posting from the past decade to understand how the set of skills in demand across a variety of industries is changing with automation.
  • Corporate words, corporate actions: In recent years, with the growth of social media platforms and increased political upheaval, companies have begun weighing in through social media to provide official statements about their stances on policy. Using tweets from publicly-traded companies and their executives, we plan to contrast the official words of these companies with the actions they report in terms of both external and internal investment related to issues surrounding the Black Lives Matter movement in the wake of the George Floyd protests.
Name of research group, project, or lab
EconText Lab
Why join this research group or lab?

Why join us?

  • Important questions. Economics affects our daily lives and the systems and structures around us. The quantitative stories from tabulated economics data can give insight into some pieces of this, but we believe we can expand the kinds of questions we can explore by including text analysis methods to help broaden our data horizons.
  • Real-world skills. Building hypotheses, processing data, making visualizations - these are all skills that are increasingly sought after in the workplace. (We know - we've looked at the job ads.) Student in our labs get a chance to build transferrable skills in computing and data science that have helped them in their later job searches and careers.
  • Collaborative opportunities. Our team ranges across majors, class years, and campuses. Weekly half-hour meetings with your teammates will give you a chance to share ideas and collaborate on code in order to build your skills.
Logistics Information:
Project categories
Computer Science
Machine Learning
Natural Language Processing
Student ranks applicable
Sophomore
Junior
Senior
Student qualifications

In general, we are seeking students who would be interested in spending two or more semesters working on a project. We find it can take the better part of a semester to get used to our workflows and research processes, so this will give you enough time to build your skills and to contribute your vision and ideas to the project.

Students must have the following to apply:

  • At least some introduction to programming that can extend to data (e.g. CS 5, CS 36, CS 42, or CS 51) 
  • Some familiarity with Python as a programming language
  • Willingness to read papers
  • Availability for a weekly half-hour meeting with the team

If you have the following additional experience, please mention it in your application:

  • Familiarity with the UNIX command line and how to remotely access a server with ssh
  • Coursework in economics, statistics, or machine learning
  • Experience with data scraping or the use of APIs to collect data
  • Data visualization capabilities in Stata, R, Python, or other plotting environments
Time commitment
Fall - Part Time
Compensation
Academic Credit
Paid Research
Number of openings
2
Techniques learned
  • Command-line operation of a server in which to run computation
  • Basics of some NLP models and applications used in our research
  • Python programming for data, including processing, organizing, and plotting data
  • Broader macroeconomic and econometric ideas that govern our work.
Contact Information:
Mentor
Xanda Schofield
aschofield@hmc.edu
Assistant Professor
Name of project director or principal investigator
Xanda Schofield
Email address of project director or principal investigator
aschofield@hmc.edu
2 sp. | 16 appl.
Hours per week
Fall - Part Time
Project categories
Natural Language Processing (+2)
Computer ScienceMachine LearningNatural Language Processing