Distinguishing Features of Scientific Papers Cited in Clean Air Policy

Air quality regulations in the United States are supposed to be based on the "best available science" articulated in the Clean Air Act, but how this notion of ’best’ is defined is unclear and at times controversial. What makes research more likely to impact policy? Is it just the subject matter covered? Novelty of the ideas? Is it the language usage in the paper? Readability? Author prestige? How have these factors changed over time? Our lab has been examining 50 years of air quality literature to discover what makes some papers more likely to be cited by regulators than others. We will use distributional tools, network analysis, and natural language processing to determine what features distinguish papers cited in the formation of Clean Air Act policy from other papers with similar topics and bibliometric data. 

Name of research group, project, or lab
Why join this research group or lab?

We hope what we discover will help scientists create more impactful papers, policymakers recognize criteria they perhaps did not realize they were using, and the public better understand how and what science is used to inform environmental regulations. This work has the potential to uncover unknown biases in the policy-making process that could lead to more robust air quality regulations in the future. Given that public trust in policy can impact compliance with regulation, we will take the opportunity to share our results widely.

Logistics Information:
Project categories
Computer Science
Data Science
Earth Science
Environmental Science
Natural Language Processing
Student ranks applicable
Student qualifications

Programming independence (CS70 or experience with independent projects or internships)

Time commitment
Summer - Full Time
Paid Research
Number of openings
Techniques learned

Data science skills: text data processing, database querying, data visualization, etc.

Research skills: critically reading scientific literature, understanding bibliometrics, forming data driven hypotheses 

Contact Information:
Sarah Kavassalis
Postdoctoral Scholar in Interdisciplinary Computation
Xanda Schofield
Assistant Professor
Name of project director or principal investigator
Sarah Kavassalis
Email address of project director or principal investigator
2 sp. | 18 appl.
Hours per week
Summer - Full Time
Project categories
Natural Language Processing (+5)
ChemistryComputer ScienceData ScienceEarth ScienceEnvironmental ScienceNatural Language Processing