# References

[1]

K.
Aas, M. Jullum, and A. Løland, “Explaining individual predictions
when features are dependent: More accurate approximations
to Shapley values,”

*Artificial Intelligence*, vol. 298, p. 103502, Sep. 2021, doi: 10.1016/j.artint.2021.103502.[2]

A.
Adadi and M. Berrada, “Peeking Inside the
Black-Box: A Survey on Explainable
Artificial Intelligence (XAI),”

*IEEE Access*, vol. 6, pp. 52138–52160, 2018, doi: 10.1109/ACCESS.2018.2870052.[3]

J.
Adebayo, J. Gilmer, M. Muelly, I. Goodfellow, M. Hardt, and B. Kim,
“Sanity checks for saliency maps,”

*Advances in neural information processing systems*, vol. 31, 2018, doi: 10.1609/aaai.v34i04.6064.[4]

C.
Anderson, “The end of theory: The data deluge makes the scientific
method obsolete,”

*Wired magazine*, vol. 16, no. 7, pp. 16–07, 2008.[5]

M.
Arjovsky, L. Bottou, I. Gulrajani, and D. Lopez-Paz, “Invariant
risk minimization,”

*arXiv preprint arXiv:1907.02893*, 2019.[6]

L.
Bottou, “Large-Scale Machine
Learning with Stochastic Gradient
Descent,” in

*Proceedings of COMPSTAT’2010*, Y. Lechevallier and G. Saporta, Eds., Heidelberg: Physica-Verlag HD, 2010, pp. 177–186. doi: 10.1007/978-3-7908-2604-3_16.[7]

S.
Alemohammad

*et al.*, “Self-consuming generative models go mad,”*arXiv preprint arXiv:2307.01850*, 2023.[8]

Y.
Alimohamadi, M. Sepandi, M. Taghdir, and H. Hosamirudsari,
“Determine the most common clinical symptoms in COVID-19 patients:
A systematic review and meta-analysis,”

*Journal of preventive medicine and hygiene*, vol. 61, no. 3, p. E304, 2020, doi: 10.15167/2421-4248/jpmh2020.61.3.1530.[9]

“AlphaFold DB website.” 2023.
Available: https://alphafold.ebi.ac.uk/

[10]

A.
Antoniou, A. Storkey, and H. Edwards, “Data augmentation
generative adversarial networks,”

*arXiv preprint arXiv:1711.04340*, 2017.[11]

S.
Athey, J. Tibshirani, and S. Wager, “Generalized random
forests,” 2019, doi: 10.1214/18-aos1709.

[12]

D.
W. Apley and J. Zhu, “Visualizing the Effects of
Predictor Variables in Black Box Supervised Learning
Models,”

*Journal of the Royal Statistical Society Series B: Statistical Methodology*, vol. 82, no. 4, pp. 1059–1086, Sep. 2020, doi: 10.1111/rssb.12377.[13]

B.
Arnold

*et al.*, “The turing way: A handbook for reproducible data science,”*Zenodo*, 2019.[14]

F.
Ayoub, T. Sato, and A. Sakuraba, “Football and COVID-19 risk:
Correlation is not causation,”

*Clinical Microbiology and Infection*, vol. 27, no. 2, pp. 291–292, 2021, doi: 10.1016/j.cmi.2020.08.034 .[15]

E.
Barnard and L. Wessels, “Extrapolation and interpolation in neural
network classifiers,”

*IEEE Control Systems Magazine*, vol. 12, no. 5, pp. 50–53, 1992, doi: 10.1109/37.158898.[16]

P.
L. Bartlett, P. M. Long, G. Lugosi, and A. Tsigler, “Benign
Overfitting in Linear
Regression,”

*Proceedings of the National Academy of Sciences*, vol. 117, no. 48, pp. 30063–30070, Dec. 2020, doi: 10.1073/pnas.1907378117.[17]

R.
Balestriero, J. Pesenti, and Y. LeCun, “Learning in high dimension
always amounts to extrapolation,”

*arXiv preprint arXiv:2110.09485*, 2021.[18]

B.
Basso and L. Liu, “Seasonal crop yield forecast:
Methods, applications, and accuracies,” in

*Advances in Agronomy*, vol. 154, Elsevier, 2019, pp. 201–255. doi: 10.1016/bs.agron.2018.11.002.[19]

S.
Bates, T. Hastie, and R. Tibshirani, “Cross-validation: What does
it estimate and how well does it do it?”

*Journal of the American Statistical Association*, pp. 1–12, 2023, doi: 10.1080/01621459.2023.2197686.[20]

S.
Beckers and J. Vennekens, “A principled approach to defining
actual causation,”

*Synthese*, vol. 195, no. 2, pp. 835–862, 2018, doi: 10.1007/s11229-016-1247-1.[21]

S.
Beckers and J. Y. Halpern, “Abstracting causal models,” in

*Proceedings of the aaai conference on artificial intelligence*, 2019, pp. 2678–2685. doi: 10.1609/aaai.v33i01.33012678.[22]

E.
Begoli, T. Bhattacharya, and D. Kusnezov, “The need for
uncertainty quantification in machine-assisted medical decision
making,”

*Nature Machine Intelligence*, vol. 1, no. 1, pp. 20–23, 2019, doi: 10.1038/s42256-018-0004-1.[23]

P.
W. Battaglia

*et al.*, “Relational inductive biases, deep learning, and graph networks,”*arXiv preprint arXiv:1806.01261*, 2018.[24]

M.
Belkin, D. Hsu, S. Ma, and S. Mandal, “Reconciling modern
machine-learning practice and the classical bias–variance
trade-off,”

*Proceedings of the National Academy of Sciences of the United States of America*, vol. 116, no. 32, pp. 15849–15854, Aug. 2019, doi: 10.1073/pnas.1903070116.[25]

Y.
Bengio, A. Courville, and P. Vincent, “Representation learning: A
review and new perspectives,”

*IEEE Transactions on Pattern Analysis and Machine Intelligence*, vol. 35, no. 8, pp. 1798–1828, 2013, doi: 10.1109/TPAMI.2013.50.[26]

S.
Bird, “NLTK: The natural language toolkit,” in

*Proceedings of the COLING/ACL 2006 interactive presentation sessions*, 2006, pp. 69–72. doi: 10.3115/1225403.1225421.[27]

M.
D. Bloice, C. Stocker, and A. Holzinger, “Augmentor: An image
augmentation library for machine learning,”

*Journal of Open Source Software*, vol. 2, no. 19, p. 432, 2017, doi: 10.21105/joss.00432.[28]

L.
Breiman,

*Classification and Regression Trees*. New York: Routledge, 2017. doi: 10.1201/9781315139470.[29]

L.
Breiman, “Random Forests,”

*Machine Learning*, vol. 45, no. 1, pp. 5–32, Oct. 2001, doi: 10.1023/A:1010933404324.[30]

“The California
Almond.” Accessed: Feb. 16, 2024. [Online].
Available: https://www.waterfordnut.com/almond.html

[31]

D.
Carpentras, “We urgently need a culture of
multi-operationalization in psychological research,”

*Communications Psychology*, vol. 2, no. 1, p. 32, 2024.[32]

R.
Chalapathy and S. Chawla, “Deep learning for anomaly detection: A
survey,”

*arXiv preprint arXiv:1901.03407*, 2019.[33]

G.
Chandrashekar and F. Sahin, “A survey on feature selection
methods,”

*Computers & electrical engineering*, vol. 40, no. 1, pp. 16–28, 2014, doi: 10.1016/j.compeleceng.2013.11.024.[34]

K.
Chasalow and K. Levy, “Representativeness in
Statistics, Politics, and Machine
Learning,” in

*Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency*, in FAccT ’21. New York, NY, USA: Association for Computing Machinery, Mar. 2021, pp. 77–89. doi: 10.1145/3442188.3445872.[35]

P.
Chattopadhyay, R. Vedantam, R. R. Selvaraju, D. Batra, and D. Parikh,
“Counting everyday objects in everyday scenes,” in

*Proceedings of the IEEE conference on computer vision and pattern recognition*, 2017, pp. 1135–1144. doi: 10.1109/cvpr.2017.471.[36]

V.
Chernozhukov

*et al.*, “Double/debiased machine learning for treatment and structural parameters.” Oxford University Press Oxford, UK, 2018. doi: 10.1111/ectj.12097.[37]

N.
Chomsky, I. Roberts, and J. Watumull, “Noam chomsky: The false
promise of ChatGPT,”

*The New York Times*, vol. 8, 2023.[38]

G.
Ciravegna, F. Precioso, A. Betti, K. Mottin, and M. Gori,
“Knowledge-driven active learning,” in

*Joint european conference on machine learning and knowledge discovery in databases*, Springer, 2023, pp. 38–54. doi: 10.1007/978-3-031-43412-9_3.[39]

L.
H. Clemmensen and R. D. Kjærsgaard, “Data
Representativity for Machine
Learning and AI Systems.”
arXiv, Feb. 2023. Accessed: Feb. 07, 2024. [Online]. Available: http://arxiv.org/abs/2203.04706

[40]

G.
S. Collins

*et al.*, “TRIPOD+AI statement: Updated guidance for reporting clinical prediction models that use regression or machine learning methods,”*BMJ (Clinical research ed.)*, vol. 385, 2024, doi: 10.1136/bmj-2023-078378.[41]

I.
C. Covert, S. Lundberg, and S.-I. Lee, “Understanding global
feature contributions with additive importance measures,” in

*Proceedings of the 34th International Conference on Neural Information Processing Systems*, in NIPS’20. Red Hook, NY, USA: Curran Associates Inc., Dec. 2020, pp. 17212–17223.[42]

G.
Corso, H. Stark, S. Jegelka, T. Jaakkola, and R. Barzilay, “Graph
neural networks,”

*Nature Reviews Methods Primers*, vol. 4, no. 1, p. 17, 2024.[43]

K.
Cranmer, J. Brehmer, and G. Louppe, “The frontier of
simulation-based inference,”

*Proceedings of the National Academy of Sciences*, vol. 117, no. 48, pp. 30055–30062, 2020, doi: 10.1073/pnas.1912789117.[44]

G.
Cybenko, “Approximation by superpositions of a sigmoidal
function,”

*Mathematics of control, signals and systems*, vol. 2, no. 4, pp. 303–314, 1989, doi: 10.1007/BF02551274.[45]

A.
Curth, A. Jeffares, and M. van der Schaar, “A u-turn on double
descent: Rethinking parameter counting in statistical learning,”

*Advances in Neural Information Processing Systems*, vol. 36, 2024.[46]

O.
D’Ecclesiis

*et al.*, “Vitamin d and SARS-CoV2 infection, severity and mortality: A systematic review and meta-analysis,”*PLoS One*, vol. 17, no. 7, p. e0268396, 2022, doi: 10.1371/journal.pone.0268396.[47]

S.
Dandl, “Causality concepts in machine learning: Heterogeneous
treatment effect estimation with machine learning & model
interpretation with counterfactual and semi-factual
explanations,” PhD thesis, lmu, 2023. doi: 10.5282/edoc.32947.

[48]

T.
Danka and P. Horvath, “modAL: A modular active learning framework
for python,”

*arXiv preprint arXiv:1805.00979*, 2018.[49]

H.
W. De Regt, “Understanding, values, and the aims of
science,”

*Philosophy of Science*, vol. 87, no. 5, pp. 921–932, 2020, doi: 10.1086/710520.[50]

B.
Dennis, J. M. Ponciano, M. L. Taper, and S. R. Lele, “Errors in
statistical inference under model misspecification: Evidence, hypothesis
testing, and AIC,”

*Frontiers in Ecology and Evolution*, vol. 7, p. 372, 2019, doi: 10.3389/fevo.2019.00372.[51]

T.
Denouden, R. Salay, K. Czarnecki, V. Abdelzad, B. Phan, and S. Vernekar,
“Improving reconstruction autoencoder out-of-distribution
detection with mahalanobis distance,”

*arXiv preprint arXiv:1812.02765*, 2018.[52]

A.
Diamantopoulos, P. Riefler, and K. P. Roth, “Advancing formative
measurement models,”

*Journal of business research*, vol. 61, no. 12, pp. 1203–1218, 2008, doi: 10.1016/j.jbusres.2008.01.009.[53]

T.
J. Diciccio and J. P. Romano, “A review of bootstrap confidence
intervals,”

*Journal of the Royal Statistical Society Series B: Statistical Methodology*, vol. 50, no. 3, pp. 338–354, 1988.[54]

P.
Domingos, “A unified bias-variance decomposition,” in

*Proceedings of 17th international conference on machine learning*, Morgan Kaufmann Stanford, 2000, pp. 231–238.[55]

A.
Sharma and E. Kiciman, “DoWhy: An end-to-end library for causal
inference,”

*arXiv preprint arXiv:2011.04216*, 2020.[56]

F.
Doshi-Velez and B. Kim, “Towards A Rigorous Science
of Interpretable Machine Learning.”
arXiv, Mar. 2017. Accessed: Dec. 01, 2023. [Online].
Available: http://arxiv.org/abs/1702.08608

[57]

Bach, V. Chernozhukov, M. S. Kurz, and M.
Spindler, “DoubleML – An object-oriented
implementation of double machine learning in
Python,”

*Journal of Machine Learning Research*, vol. 23, no. 53, pp. 1–6, 2022, doi: 10.18637/jss.v108.i03.[58]

Bach, V. Chernozhukov, M. S. Kurz, and M.
Spindler, “DoubleML – An object-oriented
implementation of double machine learning in R.”
2021. doi: 10.32614/cran.package.doubleml.

[59]

H.
E. Douglas, “Reintroducing prediction to explanation,”

*Philosophy of Science*, vol. 76, no. 4, pp. 444–463, 2009, doi: 10.1086/648111.[60]

F.
Eberhardt, C. Glymour, and R. Scheines, “On the number of
experiments sufficient and in the worst case necessary to identify all
causal relations among n variables,” in

*Proceedings of the twenty-first conference on uncertainty in artificial intelligence*, in UAI’05. Arlington, Virginia, USA: AUAI Press, 2005, pp. 178–184.[61]

J.
Earman, “Bayes or bust? A critical examination of bayesian
confirmation theory.” MIT Press, 1992.

[62]

J.
Ellenberg,

*How not to be wrong: The hidden maths of everyday life*. Penguin UK, 2014.[63]

B.
J. Erickson, P. Korfiatis, Z. Akkus, and T. L. Kline, “Machine
learning for medical imaging,”

*Radiographics*, vol. 37, no. 2, pp. 505–515, 2017, doi: 10.1148/rg.2017160130.[64]

S.
Feng

*et al.*, “A survey of data augmentation approaches for NLP,” in*Findings of the association for computational linguistics: ACL-IJCNLP 2021*, Association for Computational Linguistics, 2021. doi: 10.18653/v1/2021.findings-acl.84.[65]

A.
Fisher, C. Rudin, and F. Dominici, “All Models are
Wrong, but Many are Useful:
Learning a Variable’s Importance
by Studying an Entire Class of
Prediction Models
Simultaneously,”

*Journal of machine learning research : JMLR*, vol. 20, p. 177, 2019, Accessed: Jan. 16, 2024. [Online]. Available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8323609/[66]

A.
Fischer, “Resistance of children to covid-19. how?”

*Mucosal Immunology*, vol. 13, no. 4, pp. 563–565, 2020, doi: 10.1038/s41385-020-0303-9.[67]

S.
W. Fleming, V. V. Vesselinov, and A. G. Goodbody, “Augmenting
geophysical interpretation of data-driven operational water supply
forecast modeling for a western US river using a hybrid machine learning
approach,”

*Journal of Hydrology*, vol. 597, p. 126327, 2021.[68]

M.
Flora, C. Potvin, A. McGovern, and S. Handler, “Comparing
Explanation Methods for Traditional Machine Learning
Models Part 1: An Overview of Current
Methods and Quantifying Their Disagreement.”
arXiv, Nov. 2022. doi: 10.48550/arXiv.2211.08943.

[69]

J.
Frankle and M. Carbin, “The Lottery
Ticket Hypothesis: Finding
Sparse, Trainable Neural
Networks.” arXiv, Mar. 2019. doi: 10.48550/arXiv.1803.03635.

[70]

T.
Freiesleben, G. König, C. Molnar, and Á. Tejero-Cantero,
“Scientific inference with interpretable machine learning:
Analyzing models to learn about real-world phenomena,”

*Minds and Machines*, vol. 34, no. 3, p. 32, 2024, doi: 10.1007/s11023-024-09691-z.[71]

T.
Freiesleben, “Artificial neural nets and the representation of
human concepts,”

*arXiv preprint arXiv:2312.05337*, 2023, doi: 10.48550/arXiv.2312.05337.[72]

T.
Freiesleben and T. Grote, “Beyond generalization: A theory of
robustness in machine learning,”

*Synthese*, vol. 202, no. 4, p. 109, 2023, doi: 10.1007/s11229-023-04334-9.[73]

T.
Freiesleben, “The intriguing relation between counterfactual
explanations and adversarial examples,”

*Minds and Machines*, vol. 32, no. 1, pp. 77–109, 2022, doi: 10.1007/s11023-021-09580-9.[74]

J.
H. Friedman, “Greedy function approximation: A
gradient boosting machine.”

*The Annals of Statistics*, vol. 29, no. 5, pp. 1189–1232, Oct. 2001, doi: 10.1214/aos/1013203451.[75]

J.
H. Friedman and B. E. Popescu, “Predictive Learning
via Rule Ensembles,”

*The Annals of Applied Statistics*, vol. 2, no. 3, pp. 916–954, 2008, doi: 10.1214/07-AOAS148.[76]

K.
Fukumizu, A. Gretton, X. Sun, and B. Schölkopf, “Kernel measures
of conditional dependence,”

*Advances in neural information processing systems*, vol. 20, 2007.[77]

Y.
Gal, “Uncertainty in deep learning,” 2016.

[78]

Y.
Gal and Z. Ghahramani, “Dropout as a bayesian approximation:
Representing model uncertainty in deep learning,” in

*International conference on machine learning*, PMLR, 2016, pp. 1050–1059.[79]

Y.
Gao

*et al.*, “Machine learning based early warning system enables accurate mortality risk prediction for COVID-19,”*Nature communications*, vol. 11, no. 1, p. 5033, 2020, doi: 10.5281/zenodo.3991113.[80]

T.
Gebru

*et al.*, “Datasheets for datasets,”*Communications of the ACM*, vol. 64, no. 12, pp. 86–92, 2021, doi: 10.1145/3458723.[81]

B.
Ghai, Q. V. Liao, Y. Zhang, R. Bellamy, and K. Mueller,
“Explainable active learning (xal) toward ai explanations as
interfaces for machine teachers,”

*Proceedings of the ACM on Human-Computer Interaction*, vol. 4, no. CSCW3, pp. 1–28, 2021, doi: 10.1145/3432934.[82]

J.
B. Gibbons

*et al.*, “Association between vitamin d supplementation and COVID-19 infection and mortality,”*Scientific Reports*, vol. 12, no. 1, p. 19397, 2022, doi: 10.1038/s41598-022-24053-4.[83]

L.
S. Gottfredson, “The general intelligence factor.”
Scientific American, Incorporated, 1998.

[84]

I.
Goodfellow, Y. Bengio, and A. Courville,

*Deep learning*. MIT press, 2016.[85]

A.
Goldstein, A. Kapelner, J. Bleich, and E. Pitkin, “Peeking
Inside the Black Box: Visualizing
Statistical Learning With Plots of Individual Conditional
Expectation,”

*Journal of Computational and Graphical Statistics*, vol. 24, no. 1, pp. 44–65, Jan. 2015, doi: 10.1080/10618600.2014.907095.[86]

Y.
Gong and G. Zhao, “Wealth, health, and beyond: Is COVID-19 less
likely to spread in rich neighborhoods?”

*Plos one*, vol. 17, no. 5, p. e0267487, 2022, doi: 10.1371%2Fjournal.pone.0267487.[87]

I.
Goodfellow

*et al.*, “Generative adversarial nets,”*Advances in neural information processing systems*, vol. 27, 2014.[88]

I.
J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and
harnessing adversarial examples,”

*arXiv preprint arXiv:1412.6572*, 2014, doi: 10.48550/arXiv.1412.6572.[89]

J.
Goschenhofer, F. M. Pfister, K. A. Yuksel, B. Bischl, U. Fietzek, and J.
Thomas, “Wearable-based parkinson’s disease severity monitoring
using deep learning,” in

*Machine learning and knowledge discovery in databases: European conference, ECML PKDD 2019, würzburg, germany, september 16–20, 2019, proceedings, part III*, Springer, 2020, pp. 400–415. doi: 10.1007/978-3-030-46133-1_24.[90]

C.
Gruber, P. O. Schenk, M. Schierholz, F. Kreuter, and G. Kauermann,
“Sources of Uncertainty in Machine
Learning – A Statisticians’
View.” arXiv, May 2023. doi: 10.48550/arXiv.2305.16703.

[91]

M.
Gupta

*et al.*, “CryoEM and AI reveal a structure of SARS-CoV-2 Nsp2, a multifunctional protein involved in key host processes.”*Research Square*, 2021, doi: 10.1101/2021.05.10.443524.[92]

P.
J. Haley and D. Soloway, “Extrapolation limitations of multilayer
feedforward neural networks,” in

*[Proceedings 1992] IJCNN international joint conference on neural networks*, IEEE, 1992, pp. 25–30. doi: 10.1109/IJCNN.1992.227294.[93]

P.
R. Halmos,

*Measure theory*, vol. 18. Springer, 2013. doi: 10.1007/978-1-4684-9440-2.[94]

S.
Han, C. Lin, C. Shen, Q. Wang, and X. Guan, “Interpreting
adversarial examples in deep learning: A review,”

*ACM Computing Surveys*, vol. 55, no. 14s, pp. 1–38, 2023, doi: 10.1145/3594869.[95]

M.
Hardt and B. Recht,

*Patterns, predictions, and actions: Foundations of machine learning*. Princeton University Press, 2022.[96]

U.
Hasson, S. A. Nastase, and A. Goldstein, “Direct fit to nature: An
evolutionary perspective on biological and artificial neural
networks,”

*Neuron*, vol. 105, no. 3, pp. 416–434, 2020, doi: 10.1016/j.neuron.2019.12.002.[97]

T.
Hastie, R. Tibshirani, J. H. Friedman, and J. H. Friedman,

*The elements of statistical learning: Data mining, inference, and prediction*, vol. 2. Springer, 2009.[98]

Y.-H. He, “Machine-learning the string
landscape,”

*Physics Letters B*, vol. 774, pp. 564–568, 2017, doi: 10.1016/j.physletb.2017.10.024.[99]

D.
Hendrycks

*et al.*, “The many faces of robustness: A critical analysis of out-of-distribution generalization,” in*Proceedings of the IEEE/CVF international conference on computer vision*, 2021, pp. 8340–8349. doi: 10.1109/ICCV48922.2021.00823.[100]

B. Hofner, A. Mayr, N. Robinzonov, and M.
Schmid, “Model-based boosting in r: A hands-on tutorial using the
r package mboost,”

*Computational statistics*, vol. 29, pp. 3–35, 2014, doi: 10.1007/s00180-012-0382-5.[101]

S. Hoffmann, F. Schönbrodt, R. Elsas, R.
Wilson, U. Strasser, and A.-L. Boulesteix, “The multiplicity of
analysis strategies jeopardizes replicability: Lessons learned across
disciplines,”

*Royal Society Open Science*, vol. 8, no. 4, p. 201925, 2021, doi: 10.1098/rsos.201925.[102]

P. W. Holland, “Statistics and causal
inference,”

*Journal of the American statistical Association*, vol. 81, no. 396, pp. 945–960, 1986, doi: 10.2307/2289064.[103]

K. Hornik, “Approximation capabilities of
multilayer feedforward networks,”

*Neural networks*, vol. 4, no. 2, pp. 251–257, 1991, doi: 10.1016/0893-6080(91)90009-T.[104]

J. Howard

*et al.*, “An evidence review of face masks against COVID-19,”*Proceedings of the National Academy of Sciences*, vol. 118, no. 4, p. e2014564118, 2021, doi: 10.1073/pnas.2014564118.[105]

W. Hu

*et al.*, “Open graph benchmark: Datasets for machine learning on graphs,”*Advances in neural information processing systems*, vol. 33, pp. 22118–22133, 2020, doi: doi/10.5555/3495724.3497579.[106]

S. Hu

*et al.*, “Weakly supervised deep learning for covid-19 infection detection and classification from ct images,”*IEEE Access*, vol. 8, pp. 118869–118883, 2020, doi: 10.1109/access.2020.3005510.[107]

E. Hüllermeier and W. Waegeman,
“Aleatoric and epistemic uncertainty in machine learning: An
introduction to concepts and methods,”

*Machine Learning*, vol. 110, no. 3, pp. 457–506, Mar. 2021, doi: 10.1007/s10994-021-05946-3.[108]

F. Hutter, L. Kotthoff, and J. Vanschoren,

*Automated machine learning: Methods, systems, challenges*. Springer Nature, 2019. doi: 10.1007/978-3-030-05318-5.[109]

I. London, “Encoding cyclical continuous
features - 24-hour time,”

*Ian London’s Blog*. Jul. 2016. Accessed: Sep. 27, 2023. [Online]. Available: https://ianlondon.github.io/posts/encoding-cyclical-features-24-hour-time/[110]

A. Ilyas, S. Santurkar, D. Tsipras, L.
Engstrom, B. Tran, and A. Madry, “Adversarial examples are not
bugs, they are features,”

*Advances in neural information processing systems*, vol. 32, 2019.[111]

P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros,
“Image-to-image translation with conditional adversarial
networks,” in

*Proceedings of the IEEE conference on computer vision and pattern recognition*, 2017, pp. 1125–1134. doi: 10.1109/CVPR.2017.632.[112]

B. Jalaian, M. Lee, and S. Russell,
“Uncertain context: Uncertainty quantification in machine
learning,”

*AI Magazine*, vol. 40, no. 4, pp. 40–49, 2019, doi: 10.1609/aimag.v40i4.4812 .[113]

L. Jehi

*et al.*, “Individualizing risk prediction for positive coronavirus disease 2019 testing: Results from 11,672 patients,”*Chest*, vol. 158, no. 4, pp. 1364–1375, 2020, doi: 10.1016/j.chest.2020.05.580.[114]

J. Jumper

*et al.*, “Highly accurate protein structure prediction with AlphaFold,”*Nature*, vol. 596, no. 7873, pp. 583–589, 2021, doi: 10.1038/s41586-021-03819-2.[115]

M. Kalisch, M. Mächler, D. Colombo, M. H.
Maathuis, and P. Bühlmann, “Causal inference using graphical
models with the r package pcalg,”

*Journal of statistical software*, vol. 47, pp. 1–26, 2012, doi: 10.18637/jss.v047.i11.[116]

P. Kamath, A. Tangella, D. Sutherland, and N.
Srebro, “Does invariant risk minimization capture
invariance?” in

*International conference on artificial intelligence and statistics*, PMLR, 2021, pp. 4069–4077.[117]

S. Kapoor

*et al.*, “REFORMS: Consensus-based Recommendations for Machine-learning-based Science,”*Science Advances*, vol. 10, no. 18, p. eadk3452, May 2024, doi: 10.1126/sciadv.adk3452.[118]

E. Kasneci

*et al.*, “ChatGPT for good? On opportunities and challenges of large language models for education,”*Learning and individual differences*, vol. 103, p. 102274, 2023.[119]

A. J. Kell, D. L. Yamins, E. N. Shook, S. V.
Norman-Haignere, and J. H. McDermott, “A task-optimized neural
network replicates human auditory behavior, predicts brain responses,
and reveals a cortical processing hierarchy,”

*Neuron*, vol. 98, no. 3, pp. 630–644, 2018, doi: 10.1016/j.neuron.2018.03.044.[120]

A. Kendall and Y. Gal, “What
uncertainties do we need in bayesian deep learning for computer
vision?”

*Advances in neural information processing systems*, vol. 30, 2017.[121]

J. Kim and V. Pavlovic, “Ancient coin
recognition based on spatial coding,” in

*2014 22nd international conference on pattern recognition*, 2014, pp. 321–326. doi: 10.1109/ICPR.2014.64.[122]

M. C. Knaus, “Double machine
learning-based programme evaluation under unconfoundedness,”

*The Econometrics Journal*, vol. 25, no. 3, pp. 602–627, Jun. 2022, doi: 10.1093/ectj/utac015.[123]

D. Kobak, R. G. Márquez, E.-Á. Horvát, and J.
Lause, “Delving into ChatGPT usage in academic writing through
excess vocabulary,”

*arXiv preprint arXiv:2406.07016*, 2024.[124]

P. W. Koh

*et al.*, “Concept bottleneck models,” in*International conference on machine learning*, PMLR, 2020, pp. 5338–5348.[125]

G. König, T. Freiesleben, and M.
Grosse-Wentrup, “Improvement-focused causal recourse
(ICR),” in

*Proceedings of the AAAI conference on artificial intelligence*, 2023, pp. 11847–11855. doi: 10.1609/aaai.v37i10.26398.[126]

M. Krenn, M. Malik, R. Fickler, R. Lapkiewicz,
and A. Zeilinger, “Automated search for new quantum
experiments,”

*Physical review letters*, vol. 116, no. 9, p. 090405, 2016.[127]

T. S. Kuhn,

*The structure of scientific revolutions*, vol. 962. University of Chicago press Chicago, 1997.[128]

S. R. Künzel, J. S. Sekhon, P. J. Bickel, and
B. Yu, “Metalearners for estimating heterogeneous treatment
effects using machine learning,”

*Proceedings of the national academy of sciences*, vol. 116, no. 10, pp. 4156–4165, 2019, doi: 10.1073/pnas.1804597116.[129]

R. Lagerquist, A. McGovern, C. R. Homeyer, D.
J. Gagne II, and T. Smith, “Deep learning on three-dimensional
multiscale data for next-hour tornado prediction,”

*Monthly Weather Review*, vol. 148, no. 7, pp. 2837–2861, 2020, doi: 10.1175/MWR-D-19-0372.1.[130]

R. Lam

*et al.*, “Learning skillful medium-range global weather forecasting,”*Science*, p. eadi2336, 2023, doi: 10.1126/science.adi2336.[131]

J. Lei, M. G’Sell, A. Rinaldo, R. J.
Tibshirani, and L. Wasserman, “Distribution-Free Predictive
Inference for Regression,”

*Journal of the American Statistical Association*, vol. 113, no. 523, pp. 1094–1111, Jul. 2018, doi: 10.1080/01621459.2017.1307116.[132]

L. Lei and E. J. Candès, “Conformal
inference of counterfactuals and individual treatment effects,”

*Journal of the Royal Statistical Society Series B: Statistical Methodology*, vol. 83, no. 5, pp. 911–938, 2021, doi: 10.1111/rssb.12445.[133]

S. Letzgus and K.-R. Müller, “An
explainable AI framework for robust and transparent
data-driven wind turbine power curve models,”

*Energy and AI*, p. 100328, Dec. 2023, doi: 10.1016/j.egyai.2023.100328.[134]

Z. C. Lipton, “The Mythos of
Model Interpretability.” arXiv, Mar.
2017. doi: 10.48550/arXiv.1606.03490.

[135]

C. List, “Levels: Descriptive,
explanatory, and ontological,”

*Noûs*, vol. 53, no. 4, pp. 852–883, 2019.[136]

Y. Lu and J. Lu, “A universal
approximation theorem of deep neural networks for expressing probability
distributions,”

*Advances in neural information processing systems*, vol. 33, pp. 3094–3105, 2020, doi: 10.5555/3495724.3495984.[137]

T. C. Lucas, “A translucent box:
Interpretable machine learning in ecology,”

*Ecological Monographs*, vol. 90, no. 4, p. e01422, 2020, doi: 10.1002/ecm.1422.[138]

S. M. Lundberg and S.-I. Lee, “A unified
approach to interpreting model predictions,” in

*Proceedings of the 31st International Conference on Neural Information Processing Systems*, in NIPS’17. Red Hook, NY, USA: Curran Associates Inc., Dec. 2017, pp. 4768–4777. doi: 10.5555/3295222.3295230.[139]

K. Maaz

*et al.*,*Bildung in deutschland 2022: Ein indikatorengestützter bericht mit einer analyse zum bildungspersonal*. wbv Publikation, 2022. doi: 10.3278/6001820hw.[140]

G. Marcus, “The next decade in AI: Four
steps towards robust artificial intelligence,”

*arXiv preprint arXiv:2002.06177*, 2020, doi: 10.48550/arXiv.2002.06177.[141]

C. F. Manski,

*Partial identification of probability distributions*, vol. 5. Springer, 2003. doi: 10.1007/b97478.[142]

P. Maurage, A. Heeren, and M. Pesenti,
“Does chocolate consumption really boost nobel award chances? The
peril of over-interpreting correlations in health studies,”

*The Journal of Nutrition*, vol. 143, no. 6, pp. 931–933, 2013, doi: 10.3945/jn.113.174813.[143]

M. B. McDermott, S. Wang, N. Marinsek, R.
Ranganath, L. Foschini, and M. Ghassemi, “Reproducibility in
machine learning for health research: Still a ways to go,”

*Science Translational Medicine*, vol. 13, no. 586, p. eabb1655, 2021, doi: 10.1126/scitranslmed.abb1655.[144]

L. Mentch and G. Hooker, “Quantifying
uncertainty in random forests via confidence intervals and hypothesis
tests,”

*Journal of Machine Learning Research*, vol. 17, no. 26, pp. 1–41, 2016.[145]

T. Miller, “Explanation in artificial
intelligence: Insights from the social sciences,”

*Artificial Intelligence*, vol. 267, pp. 1–38, Feb. 2019, doi: 10.1016/j.artint.2018.07.007.[146]

M. Mitchell

*et al.*, “Model Cards for Model Reporting,” in*Proceedings of the Conference on Fairness, Accountability, and Transparency*, in FAT* ’19. New York, NY, USA: Association for Computing Machinery, Jan. 2019, pp. 220–229. doi: 10.1145/3287560.3287596.[147]

C. Molnar, G. Casalicchio, and B. Bischl,
“Quantifying model complexity via functional decomposition for
better post-hoc interpretability,” in

*Machine learning and knowledge discovery in databases: International workshops of ECML PKDD 2019, würzburg, germany, september 16–20, 2019, proceedings, part i*, Springer, 2020, pp. 193–204. doi: 10.1007/978-3-030-43823-4_17.[148]

C. Molnar,

*Interpretable machine learning: A guide for making black box models explainable*, 2nd ed. 2022. Available: https://christophm.github.io/interpretable-ml-book[149]

C. Molnar, G. König, B. Bischl, and G.
Casalicchio, “Model-agnostic Feature Importance and
Effects with Dependent Features – A
Conditional Subgroup Approach,”

*Data Mining and Knowledge Discovery*, Jan. 2023, doi: 10.1007/s10618-022-00901-9.[150]

C. Molnar

*et al.*, “Relating the Partial Dependence Plot and Permutation Feature Importance to the Data Generating Process,” in*Explainable Artificial Intelligence*, L. Longo, Ed., in Communications in Computer and Information Science. Cham: Springer Nature Switzerland, 2023, pp. 456–479. doi: 10.1007/978-3-031-44064-9_24.[151]

M. Moor

*et al.*, “Foundation models for generalist medical artificial intelligence,”*Nature*, vol. 616, no. 7956, pp. 259–265, 2023.[152]

A. Mumuni and F. Mumuni, “CNN
architectures for geometric transformation-invariant feature
representation in computer vision: A review,”

*SN Computer Science*, vol. 2, no. 5, p. 340, 2021, doi: 10.1007/s42979-021-00735-0.[153]

A. Mumuni and F. Mumuni, “Data
augmentation: A comprehensive survey of modern approaches,”

*Array*, vol. 16, p. 100258, 2022, doi: 10.1016/j.array.2022.100258.[154]

C. Nadeau and Y. Bengio, “Inference for
the generalization error,”

*Advances in neural information processing systems*, vol. 12, 1999.[155]

B. Neal, “Introduction to causal
inference,”

*Course Lecture Notes (draft)*, 2020.[156]

M. Neethu and R. Rajasree, “Sentiment
analysis in twitter using machine learning techniques,” in

*2013 fourth international conference on computing, communications and networking technologies (ICCCNT)*, IEEE, 2013, pp. 1–5. doi: 10.1109/ICCCNT.2013.6726818.[157]

T. Neupert, M. H. Fischer, E. Greplova, K.
Choo, and M. M. Denner, “Introduction to machine learning for the
sciences,”

*arXiv preprint arXiv:2102.04883*, 2021, doi: 10.48550/arXiv.2102.04883.[158]

A. Nguyen and M. R. Martı́nez,
“MonoNet: Towards Interpretable Models
by Learning Monotonic Features.” arXiv,
Sep. 2019. doi: 10.48550/arXiv.1909.13611.

[159]

M. S. Norouzzadeh

*et al.*, “Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning,”*Proceedings of the National Academy of Sciences*, vol. 115, no. 25, pp. E5716–E5725, 2018, doi: 10.1073/pnas.1719367115.[160]

T. Oikarinen

*et al.*, “Deep convolutional network for animal sound classification and source attribution using dual audio recordings,”*The Journal of the Acoustical Society of America*, vol. 145, no. 2, pp. 654–662, 2019, doi: 10.1121/1.5097583.[161]

O. S. Collaboration, “Estimating the
reproducibility of psychological science,”

*Science*, vol. 349, no. 6251, p. aac4716, 2015, doi: 10.1126/science.aac4716.[162]

E. Ozoani, M. Gerchick, and M. Mitchell,

*Model card guidebook*. Hugging Face, 2022. Available: https://huggingface.co/docs/hub/en/model-card-guidebook[163]

J. Pearl and D. Mackenzie,

*The book of why: The new science of cause and effect*. Basic books, 2018.[164]

J. Pearl,

*Causality*. Cambridge university press, 2009.[165]

J. Pearl, “The limitations of opaque
learning machines,”

*Possible minds*, vol. 25, pp. 13–19, 2019.[166]

N. Papernot, P. McDaniel, and I. Goodfellow,
“Transferability in machine learning: From phenomena to black-box
attacks using adversarial samples,”

*arXiv preprint arXiv:1605.07277*, 2016, doi: 10.48550/arXiv.1605.07277.[167]

M. A. Pedersen, “Editorial introduction:
Towards a machinic anthropology,”

*Big Data & Society*, vol. 10. SAGE Publications Sage UK: London, England, p. 20539517231153803, 2023. doi: 10.1177/20539517231153803.[168]

J. Pereira, A. J. Simpkin, M. D. Hartmann, D.
J. Rigden, R. M. Keegan, and A. N. Lupas, “High-accuracy protein
structure prediction in CASP14,”

*Proteins: Structure, Function, and Bioinformatics*, vol. 89, no. 12, pp. 1687–1699, 2021, doi: 10.1002/prot.26171.[169]

G. L. Perry, R. Seidl, A. M. Bellvé, and W.
Rammer, “An outlook for deep learning in ecosystem
science,”

*Ecosystems*, pp. 1–19, 2022, doi: 10.1007/s10021-022-00789-y.[170]

J. Peters, D. Janzing, and B. Schölkopf,

*Elements of causal inference: Foundations and learning algorithms*. The MIT Press, 2017.[171]

F. Pfisterer, S. Coors, J. Thomas, and B.
Bischl, “Multi-objective automatic machine learning with
autoxgboostmc,”

*arXiv preprint arXiv:1908.10796*, 2019, doi: 10.48550/arXiv.1908.10796.[172]

J. Platt

*et al.*, “Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods,”*Advances in large margin classifiers*, vol. 10, no. 3, pp. 61–74, 1999.[173]

K. Popper,

*The logic of scientific discovery*. Routledge, 2005.[174]

Z. Pu and E. Kalnay, “Numerical weather
prediction basics: Models, numerical methods, and data
assimilation,”

*Handbook of hydrometeorological ensemble forecasting*, pp. 67–97, 2019.[175]

M. Raissi, P. Perdikaris, and G. E.
Karniadakis, “Physics Informed Deep
Learning (Part I):
Data-driven Solutions of
Nonlinear Partial Differential
Equations.” arXiv, Nov. 2017. doi: 10.48550/arXiv.1711.10561.

[176]

M. Raissi, P. Perdikaris, and G. E.
Karniadakis, “Physics Informed Deep
Learning (Part II):
Data-driven Discovery of
Nonlinear Partial Differential
Equations.” arXiv, Nov. 2017. doi: 10.48550/arXiv.1711.10566.

[177]

P. Rajpurkar

*et al.*, “CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning.” arXiv, Dec. 2017. doi: 10.48550/arXiv.1711.05225.[178]

S.-A. Rebuffi, S. Gowal, D. A. Calian, F.
Stimberg, O. Wiles, and T. A. Mann, “Data augmentation can improve
robustness,”

*Advances in Neural Information Processing Systems*, vol. 34, pp. 29935–29948, 2021, doi: 10.48550/arXiv.2111.05328.[179]

M. Reichstein

*et al.*, “Deep learning and process understanding for data-driven earth system science,”*Nature*, vol. 566, no. 7743, pp. 195–204, 2019, doi: 10.1038/s41586-019-0912-1.[180]

P. Ren

*et al.*, “A survey of deep active learning,”*ACM computing surveys (CSUR)*, vol. 54, no. 9, pp. 1–40, 2021, doi: 10.1145/3472291.[181]

X. Ren

*et al.*, “Deep learning-based weather prediction: A survey,”*Big Data Research*, vol. 23, p. 100178, 2021, doi: 10.1016/j.bdr.2020.100178.[182]

M. T. Ribeiro, S. Singh, and C. Guestrin,
“Anchors: High-Precision Model-Agnostic
Explanations,”

*Proceedings of the AAAI Conference on Artificial Intelligence*, vol. 32, no. 1, Apr. 2018, doi: 10.1609/aaai.v32i1.11491.[183]

M. T. Ribeiro, S. Singh, and C. Guestrin,
“"Why Should I Trust You?": Explaining
the Predictions of Any Classifier,” in

*Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*, in KDD ’16. New York, NY, USA: Association for Computing Machinery, Aug. 2016, pp. 1135–1144. doi: 10.1145/2939672.2939778.[184]

M. Roberts

*et al.*, “Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans,”*Nature Machine Intelligence*, vol. 3, no. 3, pp. 199–217, 2021, doi: 10.1038/s42256-021-00307-0.[185]

J. W. Rocks and P. Mehta, “Memorizing
without overfitting: Bias, variance, and interpolation in
overparameterized models,”

*Physical review research*, vol. 4, no. 1, p. 013201, 2022, doi: 10.1103/PhysRevResearch.4.013201.[186]

Y. Romano, E. Patterson, and E. Candes,
“Conformalized quantile regression,”

*Advances in neural information processing systems*, vol. 32, 2019.[187]

Y. Romano, M. Sesia, and E. Candes,
“Classification with valid and adaptive coverage,”

*Advances in Neural Information Processing Systems*, vol. 33, pp. 3581–3591, 2020.[188]

R. Roscher, B. Bohn, M. F. Duarte, and J.
Garcke, “Explainable Machine Learning for
Scientific Insights and Discoveries,”

*IEEE Access*, vol. 8, pp. 42200–42216, 2020, doi: 10.1109/ACCESS.2020.2976199.[189]

J. Rothfuss, F. Ferreira, S. Walther, and M.
Ulrich, “Conditional density estimation with neural networks: Best
practices and benchmarks,”

*arXiv preprint arXiv:1903.00954*, 2019, doi: 10.48550/arXiv.1903.00954.[190]

C. Rudin, C. Chen, Z. Chen, H. Huang, L.
Semenova, and C. Zhong, “Interpretable machine learning:
Fundamental principles and 10 grand challenges,”

*Statistics Surveys*, vol. 16, no. none, pp. 1–85, Jan. 2022, doi: 10.1214/21-SS133.[191]

L. Ruff

*et al.*, “A unifying review of deep and shallow anomaly detection,”*Proceedings of the IEEE*, vol. 109, no. 5, pp. 756–795, 2021, doi: 10.1109/JPROC.2021.3052449.[192]

R. Schaeffer

*et al.*, “Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle.” arXiv, Mar. 2023. doi: 10.48550/arXiv.2303.14151.[193]

J. Schmidt, M. R. Marques, S. Botti, and M. A.
Marques, “Recent advances and applications of machine learning in
solid-state materials science,”

*npj Computational Materials*, vol. 5, no. 1, pp. 1–36, 2019, doi: 10.1038/s41524-019-0221-0.[194]

C. A. Scholbeck, C. Molnar, C. Heumann, B.
Bischl, and G. Casalicchio, “Sampling, Intervention,
Prediction, Aggregation: A
Generalized Framework for
Model-Agnostic
Interpretations,” in

*Machine Learning and Knowledge Discovery in Databases*, P. Cellier and K. Driessens, Eds., in Communications in Computer and Information Science. Cham: Springer International Publishing, 2020, pp. 205–216. doi: 10.1007/978-3-030-43823-4_18.[195]

B. Schölkopf, D. Janzing, J. Peters, E.
Sgouritsa, K. Zhang, and J. Mooij, “On causal and anticausal
learning,”

*arXiv preprint arXiv:1206.6471*, 2012, doi: 10.48550/arXiv.1206.6471.[196]

B. Schölkopf

*et al.*, “Toward causal representation learning,”*Proceedings of the IEEE*, vol. 109, no. 5, pp. 612–634, 2021, doi: 10.1109/JPROC.2021.3058954.[197]

B. Schölkopf, “Causality for machine
learning,” in

*Probabilistic and causal inference: The works of judea pearl*, 2022, pp. 765–804.[198]

H. Seibold,

*6 steps towards reproducible research*. Zenodo, 2024. doi: 10.5281/zenodo.12744715.[199]

R. Sen, A. T. Suresh, K. Shanmugam, A. G.
Dimakis, and S. Shakkottai, “Model-powered conditional
independence test,”

*Advances in neural information processing systems*, vol. 30, 2017, doi: 10.5555/3294996.3295055.[200]

L. Semenova, H. Chen, R. Parr, and C. Rudin,
“A path to simpler models starts with noise,”

*Advances in neural information processing systems*, vol. 36, 2024.[201]

B. Settles, “Active learning literature
survey,” University of Wisconsin–Madison, Computer Sciences
Technical Report 1648, 2009.

[202]

S. Shalev-Shwartz and S. Ben-David,

*Understanding machine learning: From theory to algorithms*. Cambridge university press, 2014. doi: 10.1017/CBO9781107298019.[203]

R. D. Shah and J. Peters, “The hardness
of conditional independence testing and the generalised covariance
measure,”

*The Annals of Statistics*, vol. 48, no. 3, pp. 1514–1538, 2020, doi: 10.1214/19-AOS1857.[204]

C. Shorten and T. M. Khoshgoftaar, “A
survey on image data augmentation for deep learning,”

*Journal of big data*, vol. 6, no. 1, pp. 1–48, 2019, doi: 10.1186/s40537-019-0197-0.[205]

P. Spirtes, C. N. Glymour, and R. Scheines,

*Causation, prediction, and search*. MIT press, 2000. doi: 10.1007/978-1-4612-2748-9.[206]

R. Frigg and S. Hartmann, “Models in Science,” in

*The Stanford encyclopedia of philosophy*, Spring 2020., E. N. Zalta, Ed., https://plato.stanford.edu/archives/spr2020/entries/models-science/; Metaphysics Research Lab, Stanford University, 2020.[207]

C. Hitchcock and M. Rédei, “Reichenbach’s Common Cause Principle,” in

*The Stanford encyclopedia of philosophy*, Summer 2021., E. N. Zalta, Ed., https://plato.stanford.edu/archives/sum2021/entries/physics-Rpcc/; Metaphysics Research Lab, Stanford University, 2021.[208]

A. Chakravartty, “Scientific
Realism,” in

*The Stanford encyclopedia of philosophy*, Summer 2017., E. N. Zalta, Ed., https://plato.stanford.edu/archives/sum2017/entries/scientific-realism/; Metaphysics Research Lab, Stanford University, 2017.[209]

G. Shmueli, “To
Explain or to Predict?”

*Statistical Science*, vol. 25, no. 3, pp. 289–310, 2010, doi: 10.1214/10-STS330.[210]

T. F. Sterkenburg and P. D. Grünwald,
“The no-free-lunch theorems of supervised learning,”

*Synthese*, vol. 199, no. 3, pp. 9979–10015, 2021, doi: 10.1007/s11229-021-03233-1.[211]

C. Strobl, A.-L. Boulesteix, T. Kneib, T.
Augustin, and A. Zeileis, “Conditional variable importance for
random forests,”

*BMC bioinformatics*, vol. 9, pp. 1–11, 2008.[212]

E. Štrumbelj and I. Kononenko,
“Explaining prediction models and individual predictions with
feature contributions,”

*Knowledge and Information Systems*, vol. 41, no. 3, pp. 647–665, Dec. 2014, doi: 10.1007/s10115-013-0679-x.[213]

S. L. Smith, B. Dherin, D. G. Barrett, and S.
De, “On the origin of implicit regularization in stochastic
gradient descent,”

*arXiv preprint arXiv:2101.12176*, 2021, doi: 10.48550/arXiv.2101.12176.[214]

A. Swanson, M. Kosmala, C. Lintott, R. Simpson,
A. Smith, and C. Packer, “Snapshot serengeti, high-frequency
annotated camera trap images of 40 mammalian species in an african
savanna,”

*Scientific data*, vol. 2, no. 1, pp. 1–14, 2015, doi: 10.1038/sdata.2015.26.[215]

C. Szegedy

*et al.*, “Intriguing properties of neural networks,”*arXiv preprint arXiv:1312.6199*, 2013, doi: 10.48550/arXiv.1312.6199.[216]

I. Takeuchi, Q. V. Le, T. D. Sears, A. J.
Smola, and C. Williams, “Nonparametric quantile
estimation.”

*Journal of machine learning research*, vol. 7, no. 7, 2006.[217]

N. Taleb, “The black swan: Why don’t we
learn that we don’t learn,”

*NY: Random House*, vol. 1145, 2005.[218]

Y.-P. Tang, G.-X. Li, and S.-J. Huang,
“ALiPy: Active learning in python,”

*arXiv preprint arXiv:1901.03802*, 2019, doi: 10.48550/arXiv.1901.03802.[219]

T. Tanay and L. Griffin, “A boundary
tilting persepective on the phenomenon of adversarial examples,”

*arXiv preprint arXiv:1608.07690*, 2016, doi: 10.48550/arXiv.1608.07690.[220]

A. S. Tejani

*et al.*, “Checklist for Artificial Intelligence in Medical Imaging (CLAIM): 2024 Update,”*Radiology: Artificial Intelligence*, vol. 6, no. 4, p. e240300, Jul. 2024, doi: 10.1148/ryai.240300.[221]

D. Tsipras, S. Santurkar, L. Engstrom, A.
Turner, and A. Madry, “Robustness may be at odds with
accuracy,”

*arXiv preprint arXiv:1805.12152*, 2018, doi: 10.48550/arXiv.1805.12152.[222]

J. Q. Toledo-Marı́n, G. Fox, J. P. Sluka, and J.
A. Glazier, “Deep learning approaches to surrogates for solving
the diffusion equation for mechanistic real-world simulations,”

*Frontiers in Physiology*, vol. 12, p. 667828, 2021, doi: 10.3389/fphys.2021.667828.[223]

C. Uhler, G. Raskutti, P. Bühlmann, and B. Yu,
“Geometry of the faithfulness assumption in causal
inference,”

*The Annals of Statistics*, pp. 436–463, 2013.[224]

M. J. Van der Laan and S. Rose,

*Targeted learning*, vol. 1. Springer, 2011. doi: 10.1007/978-1-4419-9782-1.[225]

R. Van Noorden and J. M. Perkel,
“AI and science: What 1,600 researchers
think,”

*Nature*, vol. 621, no. 7980, pp. 672–675, Sep. 2023, doi: 10.1038/d41586-023-02980-0.[226]

V. N. Vapnik, “An overview of statistical
learning theory,”

*IEEE transactions on neural networks*, vol. 10, no. 5, pp. 988–999, 1999, doi: 10.1109/72.788640.[227]

C. Vens, J. Struyf, L. Schietgat, S. Džeroski,
and H. Blockeel, “Decision trees for hierarchical multi-label
classification,”

*Machine learning*, vol. 73, pp. 185–214, 2008, doi: 10.1007/s10994-008-5077-3.[228]

T. Vigen,

*Spurious correlations*. Hachette UK, 2015.[229]

V. Vovk, G. Shafer, and I. Nouretdinov,
“Self-calibrating probability forecasting,”

*Advances in neural information processing systems*, vol. 16, 2003.[230]

S. Wachter, B. Mittelstadt, and C. Russell,
“Counterfactual Explanations Without Opening the
Black Box: Automated Decisions and the
GDPR,”

*SSRN Electronic Journal*, 2017, doi: 10.2139/ssrn.3063289.[231]

S. Wang

*et al.*, “Defensive dropout for hardening deep neural networks under adversarial attacks,” in*2018 IEEE/ACM international conference on computer-aided design (ICCAD)*, IEEE, 2018, pp. 1–8. doi: 10.1145/3240765.3264699.[232]

D. S. Watson and M. N. Wright, “Testing
conditional independence in supervised learning algorithms,”

*Machine Learning*, vol. 110, no. 8, pp. 2107–2129, Aug. 2021, doi: 10.1007/s10994-021-06030-6.[233]

D. H. Wolpert, “The lack of a priori
distinctions between learning algorithms,”

*Neural computation*, vol. 8, no. 7, pp. 1341–1390, 1996, doi: 10.1162/neco.1996.8.7.1341.[234]

Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and
S. Y. Philip, “A comprehensive survey on graph neural
networks,”

*IEEE transactions on neural networks and learning systems*, vol. 32, no. 1, pp. 4–24, 2020, doi: 10.1109/TNNLS.2020.2978386.[235]

L. Wynants

*et al.*, “Prediction models for diagnosis and prognosis of covid-19: Systematic review and critical appraisal,”*BMJ (Clinical research ed.)*, vol. 369, p. m1328, Apr. 2020, doi: 10.1136/bmj.m1328.[236]

J. Yang, K. Zhou, Y. Li, and Z. Liu,
“Generalized Out-of-Distribution
Detection: A Survey,”

*International Journal of Computer Vision*, Jun. 2024, doi: 10.1007/s11263-024-02117-4.[237]

J. Yoon, J. Jordon, and M. Van Der Schaar,
“GANITE: Estimation of individualized treatment effects using
generative adversarial nets,” in

*International conference on learning representations*, 2018.[238]

K. Yu

*et al.*, “Causality-based feature selection: Methods and evaluations,”*ACM Computing Surveys (CSUR)*, vol. 53, no. 5, pp. 1–36, 2020, doi: 10.1145/3409382.[239]

A. Zeileis, T. Hothorn, and K. Hornik,
“Model-Based Recursive Partitioning,”

*Journal of Computational and Graphical Statistics*, vol. 17, no. 2, pp. 492–514, Jun. 2008, doi: 10.1198/106186008X319331.[240]

Z. Zhang, Y. Jin, B. Chen, and P. Brown,
“California almond yield prediction at the orchard level with a
machine learning approach,”

*Frontiers in plant science*, vol. 10, p. 809, 2019, doi: 10.3389/fpls.2019.00809/full.[241]

H. Zhang, H. Chen, Z. Song, D. Boning, I. S.
Dhillon, and C.-J. Hsieh, “The limitations of adversarial training
and the blind-spot attack,”

*arXiv preprint arXiv:1901.04684*, 2019, doi: 10.48550/arXiv.1901.04684.