Integrating Machine Learning Into Science

In Part I, we argued that machine learning and science could be a great match. However, bare-bones machine learning is an incomplete scientific methodology when it comes to incorporating domain knowledge, gaining causal insight, and modeling uncertainties of predictions.

Part II concerns these missing puzzle pieces and puts them together. The chapters focus on formalizing and specifying the problems. For example, we introduce causal models to formulate causal questions or basic probability theory to talk about uncertainties. We also discuss general solutions like using double machine learning to estimate causal effects or using Rashomon sets to quantify uncertainties. While each solution is presented in isolation, we point out interactions between different topics. Part II consists of the following chapters:

7 Generalization: Connecting the theory of generalization to machine learning practice.
8 Domain Knowledge: Integrating domain knowledge using suitable inductive biases.
9 Interpretability: Interpreting machine learning models to gain scientific insights.
10 Causality: Integrating causal assumptions to draw causal inferences.
11 Robustness: Analyzing distribution shifts and robustifying machine learning models.
12 Uncertainty: Understanding the sources of error and quantifying uncertainties.
13 Reproducibility: Avoiding common pitfalls that hinder reproducibility.
14 Reporting: Providing best practices for model reports.