Integrating Machine Learning Into Science
In Part I, we argued that machine learning and science could potentially be a great match. However, bare-bones machine learning still has several limitations. For example, it is hard to incorporate domain knowledge, gain causal insight, and model uncertainties of predictions.
Part II addresses these pieces of the puzzle. On the one hand side, the chapters focus on formalizing and specifying the problems. For example, we introduce causal models to formulate causal questions or basic probability theory to talk about uncertainties. On the other hand side, we discuss general solutions that allow to address these questions. For example, we show how to use double machine learning to estimate causal effects or Rashomon sets to quantify uncertainties. While each solution is presented in isolation, we point out the interactions between different topics in each chapter.
Part II consists of the following chapters:
- 8 Generalization: Connecting the theory of generalization to machine learning practice.
- 9 Domain Knowledge: Integrating domain knowledge using suitable inductive biases.
- 10 Interpretability: Interpreting machine learning models to gain scientific insights and justify predictions.
- 11 Causality: Integrating causal assumptions to draw causal inferences.
- 12 Robustness: Analyzing distribution shifts and robustifying machine learning models at all steps in the pipeline.
- 13 Uncertainty: Understanding the sources of error and quantifying uncertainties with frequentist and Bayesian approaches.
- 14 Reproducibility: Avoiding common pitfalls that hinder the reproducibility of machine learning results.
- 15 Reporting: Providing proper model reports with best practices like checklists, model cards, and data sheets.