5  Machine Learning and Other Scientific Goals: A Clash

Feedback wanted

This book is work in progress. We are happy to receive your feedback via science-book@christophmolnar.com

Supervised machine learning allows you to predict well and prediction is important in science. All scientists aboard the machine learning ship! Right? While machine learning has been around for years now, why do scientists still have a bad gut feeling about machine learning? In its raw form, supervised machine learning clashes with three other essential goals of science that go beyond prediction: controlling, explaining, and reasoning about phenomena (Frigg and Hartmann 2020; De Regt 2020).

The Elder’s Council agreed that prediction is critical in science and any tool that helps pursue this goal should be supported. But some voices remained critical: goals other than prediction might get neglected. Raven Science was strongly centered around explaining why things happen, not merely predicting them. Is supervised learning leading to science we can’t understand and can’t control?

5.1 ML trades predictive performance in favor of other scientific goals

We believe that the main reason why some scientists are suspicious of ML is the exclusive focus on prediction. Barebones ML readily trades a tiny gain in predictive performance against any other goal we may have in science, such as controlling, explaining, or reasoning about phenomena. A supervised machine learning model will take any short-cut to improve predictive performance:

  • If given non-causal features that are associated with the outcome, the ML algorithm will likely learn a model that relies on them. For example, a ML model may rely on the number of sold winter jackets to predict flu cases.
  • If the training data are biased, the model will reflect these biases as it helps in predicting the biased data.
  • If a complex model has a better predictive performance than an interpretable model, even marginally, the model selection step will spit out the complex model as the winner.

An absolute framing of a question in terms of prediction drowns out other goals. But depending on the context, these goals may have a higher importance than prediction.

5.2 In science we want to control phenomena

Imagine you study education science. Your data analysis shows a strong association between the academic success of kids and the education level of their parents. In Germany, for example, only 8 of 100 students coming from a working-class family receive a master’s degree, whereas 45 of 100 students with academic parents do so (Maaz et al. 2022). Indeed, this association allows us to make predictions. Tell me where you come from and I’ll tell you where you’ll end up. You are from a working-class family? The prediction model says: “No master title for you!”

Focusing on prediction is deeply unsatisfying in this context. Ethically, we as a society do not want this disparity between kids from the working class and kids from academic families – we believe in the equality of opportunity. Instead of predicting academic success, we want to find policy interventions to close the gap. How to encourage more educational climbers? What are the factors that hinder working-class children from pursuing a master’s degree and how to change them? Would an increase in student stipends lead to more students from working-class families?

All these questions are concerned with control. Control is the ability to perform informed actions on a system and estimate the consequential dynamics of the system. For control, we must:

  • know the causal factors, know how to act upon them, and also be able to do so
  • evaluate the uncertainties in the consequences of actions
  • analyze time dynamics and potential long-term equilibria

Let’s say the German government decides to increase the financial support for students, known under the beautiful German term Bundesausbildungsförderungsgesetz, short BAföG. They must be sure that a lack of money is one causal factor that prevents students from studying (this is indeed true, see (Maaz et al. 2022)) and have money to increase financial support. The government should be aware of which effect the increase has: is it eaten up by inflation? Can it get more working-class kids to academic degrees? How long will it take for the increase in financial support to be enacted and when will it impact student’s choices and how?

Scientific models have always been one means to enable control over a system. They represent the system, allow us to run simulations, and thereby think about the outcomes of actions. However, standard ML models are not suitable here. ML models are unable to distinguish causes from effects. ML models neither represent potential actions nor their consequences. ML models only give point estimates rather than uncertainties. ML models often ignore the role of time.

So no control with ML? We discuss how at least some of these problems can be alleviated in Chapter 11 and Chapter 12.

5.3 In science we want to explain phenomena

Imagine you study economics. You want to understand how financial crises arise. Your approach is to study historical crises to gain insights into potential future crises. The first thing that comes to your mind is to check the indicators of past crises. Clear signs were plunges in the stock market, increased unemployment rates, or lower gross domestic product (GDP) across countries. All these signs are indicators associated with the crisis. You may even detect them by simply fitting an ML model to a dataset containing a broad range of general economic indicators. But is this really what you are after?

Scientists are ultimately interested in why financial crises arose rather than that they did. Would there have been the great depression in the 1930s without the Wall Street Crash of 1929? How did the Yom Kippur War impact the 1973 oil crisis? What was the role of mortgage-backed security in the financial crisis of 2007-2008?

In all these cases, we are interested in the causes that led to certain outcomes. We want to explain rather than just predict. Explanations point to factors that if they had been different, would have impacted the explained phenomenon. Think of the oil crisis in 1973. To establish that the Yom Kippur War was a cause, we must establish two conditions: 1) factual condition: there was the Yom Kippur War and the oil crisis in 1973; 2) counterfactual condition: if there had not been the Yom Kippur War, there would have been no oil crisis. While the first condition can be easily checked empirically, the second condition is tough.1 We do not know how history turned out to be without the Yom Kippur War. The Organization of Arab Petroleum Exporting Countries (OAPEC) might not have set an embargo or set the embargo later to influence another US policy. Still, scientists can often provide arguments to establish counterfactuals, for example, pointing to other actions of the OAPEC, game-theoretic reasoning, or interviews with OAPEC leaders.

What establishes a scientific explanation?

Philosophers of science agree on the key ingredients for explanations. We need:

  • explanandum (pl. explananda): a statement about the phenomenon to be explained. In the oil-crisis example, the explanandum could be the statement ‘There was an oil crisis in 1973.’
  • explanans (pl. explanantia): a statement that if true would explain the explanandum. In the oil-crisis example, the explanans could be the statement ‘The US supported Israel in the Yom Kippur war.’
  • explanatory link: rules or laws describing the mechanism underlying the phenomenon by which the explanans is connected to the explanandum. In the oil-crisis example, the explanatory link could be ‘The OAPEC, who stood on the Yom Kippur war with the Arab states, wanted to change the US policy concerning Israel. The OAPEC leader, particularly King Faisal, believed that the oil could be used as a weapon to force the US to change its policy.’

Scientific models are one approach to obtaining scientific explanations. Scientific models represent the explanandum, the factors referenced in the explanans, and, most importantly, the explanatory link (e.g. the causal relationships) to establish a scientific explanation. Unfortunately, standard ML models do not serve this function of scientific models. ML models do not distinguish between causal factors and mere associations. ML models do not describe the real-world mechanism that led to the outcome but only give predictions. ML models work well in static environments but are unreliable if the environmental conditions are altered.

So no explanation with ML? We discuss how at least some of these problems can be alleviated in Chapter 11, Chapter 9, and Chapter 13.

5.4 In science we want to reason about phenomena

Imagine you are a biologist and want to study how the amino-acid sequence influences the protein structure. You know for a fact that the protein structure is fully determined by its amino-acid sequence. Some researchers developed this amazing deep neural network that is very good at predicting the protein structure. Great, problem solved?!

Well, yes and no. We solved one critical aspect of the problem, we can describe the protein structure and this is what ultimately defines the function of the protein. But it seems our solution is not intelligible to us humans – and ultimately, we are the entities who are conducting science (or ravens ;) ). Why do certain amino-acid sequences lead to specific predicted protein structures? What are the most important subsequences when it comes to the prediction? How does the prediction model align with our theoretical biological knowledge about amino-acid sequences and proteins?

Being able to reason about phenomena is important for two reasons:

  • Communication: We must be able to teach and share scientific knowledge in schools, universities, and research papers. This allows us to jointly reason about phenomena and collaborate. Moreover, if scientific knowledge informs public decisions, it must be cognitively accessible to a broad public.
  • Intuition pump: Intuition drives the questions we ask and the experiments we conduct. It allows us to draw links to other phenomena or fields we know, and to traditional scientific models or knowledge we have. We cannot intuit about phenomena to which we have no cognitive access.

One approach to reason about phenomena is via scientific models. Scientific models provide us with a language for phenomena. Through variables, functions, and parameters, models equip us with a vocabulary (tokens) to talk about phenomenon components, dependencies, and properties respectively (Freiesleben et al. 2022). Simplicity and beauty are important in that respect: Simplicity allows us to communicate effectively and efficiently, and we can emulate phenomenon behavior without running exact simulations; Beauty sparks our curiosity and intuitions, it allows us to learn the scientific language quickly and draw new links.2

Standard ML models, like the protein structure prediction model from the example above, are unfortunately terrible reasoning tools. Only very few model elements can be interpreted, such as the input (amino-acid sequence) and the output variables (protein structure). Most other model elements do not allow a simple interpretation, think, for example, of the weights or activation functions in the neural net. Similarly, it is hard to assess how well ML models align with our background knowledge and also hard to infuse such knowledge. Instead of simplicity and beauty, ML models are designed for high complexity and trained solely under the objective of predictive performance.

So no reasoning with ML models? We discuss how at least some of these problems can be alleviated in Chapter 10, Chapter 9, and Chapter 15.


  1. The fact that it is impossible to get data to establish the counterfactual condition is called the fundamental problem of causal inference (Holland 1986).↩︎

  2. Interestingly, simplicity and beauty seem to be favored not only by humans but also, at least partially, by nature, as evidenced by the overwhelming success of Ockham’s razor in many disciplines when it comes to extrapolation-type predictions.↩︎