Understanding the Earth’s atmosphere via machine learning

Understanding the composition of Earth’s atmosphere through machine learning: can machine learning give us insight into the mechanisms controlling air quality and climate?

Through concentrated field campaigns, long-term air quality monitoring programs, and satellites, vast amounts of data are being collected by atmospheric scientists to help shape our understanding of the composition and chemistry of Earth’s atmosphere. This is especially important as we observe rapid changes to this composition related to human activity. Applications of modern machine learning (ML) techniques have recently become an area of major interest in the air quality and climate community, but there are valid criticisms about the use of some of these approaches for gaining process level information. This is especially true for deep learning methods that don’t intuitively provide insights into the chemical or physical mechanisms behind trends identified in the data. The current state-of-the-art in air quality and Earth system modelling is highly mechanistic, where large systems of coupled ordinary differential equations representing chemical reactions (air quality) are solved coupled with discretized partial differential equations representing physical dynamics (weather). These models provide a large amount of information about processes and mechanisms and are powerful tools for scientists and policy makers for understanding the current and potential future composition of the atmosphere.  Unfortunately, they are very computationally costly to run, and require use of supercomputers or large clusters and a significant amount of computation time. Computationally “cheap” ML models, if pre-trained, can be run on more conventional computers in a fraction of the time, making access to this area of research far more accessible.  Before the use of these techniques becomes more widespread, there is an important question the community needs to answer: Within the large pool of data already collected, are there datasets that would be robust enough to train a ML model on that could then make meaningful predictions about future states of the atmosphere, given the large non-linearities that exist in both the physical and chemical dynamics of the system?

There are opportunities for students to work over the semester and/or summer on a number of tasks to help answer this questions:

  1. Selecting and downloading potential training and validation data
  2. Training a variety of ML models (deep learning and more traditional)
  3. Performing model validation and identifying appropriate benchmarks for the field
  4. Identifying interesting test case scenarios (ie. RCP4.5, changing vehicle emission standards, etc.)
  5. Comparing ML model predictions to output from mechanistic chemical-transport (GEOS-Chem, CMAQ) and/or Earth system models (CESM)
  6. Identifying potential adversarial techniques and bias in datasets

Essay prompt (address the following in 1-2 paragraphs):

  1. What interests you in the project?
  2. What experience do you have with atmospheric chemistry/air quality, differential equations, and machine learning? (It’s okay if the answer is none!)
  3. What do you hope to get out of this research experience?
  4. Would you potentially be interested in continuing this project beyond the spring semester (e.g., as a summer research project, fall 2022, etc.)?

Why join this research project

This will be an opportunity to learn the numerical modelling techniques used in atmospheric chemistry while also gaining experience with machine learning. This is also potentially an opportunity to work with big data (depending on dataset selection and student interest) and develop your data handling skills. Students will also have the opportunity to learn about current air quality and climate policy decisions and how models help make them.

Name of research group, project, or lab
Interdisciplinary Computing in Atmospheric Chemistry
Logistics Information:
Project categories
Computer Science
Data Science
Machine Learning
Natural Resources and Conservation
Numerical Modeling
Student ranks applicable
Time commitment
Spring - Part Time
Academic Credit
Number of openings
Contact Information:
Mentor name
Sarah Kavassalis
Mentor email
Mentor position
Postdoctoral Scholar in Interdisciplinary Computation
Name of project director or principal investigator
Sarah Kavassalis
Email address of project director or principal investigator
3 sp. | 0 appl.
Hours per week
Spring - Part Time
Project categories
Computer Science (+6)
ChemistryComputer SciencePhysicsData ScienceMachine LearningNatural Resources and ConservationNumerical Modeling