References

[1]

K. Aas, M. Jullum, and A. Løland, “Explaining individual predictions when features are dependent: More accurate approximations to Shapley values,” Artificial Intelligence, vol. 298, p. 103502, Sep. 2021, doi: 10.1016/j.artint.2021.103502.

[2]

A. Adadi and M. Berrada, “Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI),” IEEE Access, vol. 6, pp. 52138–52160, 2018, doi: 10.1109/ACCESS.2018.2870052.

[3]

J. Adebayo, J. Gilmer, M. Muelly, I. Goodfellow, M. Hardt, and B. Kim, “Sanity checks for saliency maps,” Advances in neural information processing systems, vol. 31, 2018, doi: 10.1609/aaai.v34i04.6064.

[4]

C. Anderson, “The end of theory: The data deluge makes the scientific method obsolete,” Wired magazine, vol. 16, no. 7, pp. 16–07, 2008.

[5]

M. Arjovsky, L. Bottou, I. Gulrajani, and D. Lopez-Paz, “Invariant risk minimization,” arXiv preprint arXiv:1907.02893, 2019.

[6]

L. Bottou, “Large-Scale Machine Learning with Stochastic Gradient Descent,” in Proceedings of COMPSTAT’2010, Y. Lechevallier and G. Saporta, Eds., Heidelberg: Physica-Verlag HD, 2010, pp. 177–186. doi: 10.1007/978-3-7908-2604-3_16.

[7]

S. Alemohammad et al., “Self-consuming generative models go mad,” arXiv preprint arXiv:2307.01850, 2023.

[8]

Y. Alimohamadi, M. Sepandi, M. Taghdir, and H. Hosamirudsari, “Determine the most common clinical symptoms in COVID-19 patients: A systematic review and meta-analysis,” Journal of preventive medicine and hygiene, vol. 61, no. 3, p. E304, 2020, doi: 10.15167/2421-4248/jpmh2020.61.3.1530.

[9]

“AlphaFold DB website.” 2023. Available: https://alphafold.ebi.ac.uk/

[10]

A. Antoniou, A. Storkey, and H. Edwards, “Data augmentation generative adversarial networks,” arXiv preprint arXiv:1711.04340, 2017.

[11]

S. Athey, J. Tibshirani, and S. Wager, “Generalized random forests,” 2019, doi: 10.1214/18-aos1709.

[12]

D. W. Apley and J. Zhu, “Visualizing the Effects of Predictor Variables in Black Box Supervised Learning Models,” Journal of the Royal Statistical Society Series B: Statistical Methodology, vol. 82, no. 4, pp. 1059–1086, Sep. 2020, doi: 10.1111/rssb.12377.

[13]

B. Arnold et al., “The turing way: A handbook for reproducible data science,” Zenodo, 2019.

[14]

F. Ayoub, T. Sato, and A. Sakuraba, “Football and COVID-19 risk: Correlation is not causation,” Clinical Microbiology and Infection, vol. 27, no. 2, pp. 291–292, 2021, doi: 10.1016/j.cmi.2020.08.034 .

[15]

E. Barnard and L. Wessels, “Extrapolation and interpolation in neural network classifiers,” IEEE Control Systems Magazine, vol. 12, no. 5, pp. 50–53, 1992, doi: 10.1109/37.158898.

[16]

P. L. Bartlett, P. M. Long, G. Lugosi, and A. Tsigler, “Benign Overfitting in Linear Regression,” Proceedings of the National Academy of Sciences, vol. 117, no. 48, pp. 30063–30070, Dec. 2020, doi: 10.1073/pnas.1907378117.

[17]

R. Balestriero, J. Pesenti, and Y. LeCun, “Learning in high dimension always amounts to extrapolation,” arXiv preprint arXiv:2110.09485, 2021.

[18]

B. Basso and L. Liu, “Seasonal crop yield forecast: Methods, applications, and accuracies,” in Advances in Agronomy, vol. 154, Elsevier, 2019, pp. 201–255. doi: 10.1016/bs.agron.2018.11.002.

[19]

S. Bates, T. Hastie, and R. Tibshirani, “Cross-validation: What does it estimate and how well does it do it?” Journal of the American Statistical Association, pp. 1–12, 2023, doi: 10.1080/01621459.2023.2197686.

[20]

S. Beckers and J. Vennekens, “A principled approach to defining actual causation,” Synthese, vol. 195, no. 2, pp. 835–862, 2018, doi: 10.1007/s11229-016-1247-1.

[21]

S. Beckers and J. Y. Halpern, “Abstracting causal models,” in Proceedings of the aaai conference on artificial intelligence, 2019, pp. 2678–2685. doi: 10.1609/aaai.v33i01.33012678.

[22]

E. Begoli, T. Bhattacharya, and D. Kusnezov, “The need for uncertainty quantification in machine-assisted medical decision making,” Nature Machine Intelligence, vol. 1, no. 1, pp. 20–23, 2019, doi: 10.1038/s42256-018-0004-1.

[23]

P. W. Battaglia et al., “Relational inductive biases, deep learning, and graph networks,” arXiv preprint arXiv:1806.01261, 2018.

[24]

M. Belkin, D. Hsu, S. Ma, and S. Mandal, “Reconciling modern machine-learning practice and the classical bias–variance trade-off,” Proceedings of the National Academy of Sciences of the United States of America, vol. 116, no. 32, pp. 15849–15854, Aug. 2019, doi: 10.1073/pnas.1903070116.

[25]

Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A review and new perspectives,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1798–1828, 2013, doi: 10.1109/TPAMI.2013.50.

[26]

S. Bird, “NLTK: The natural language toolkit,” in Proceedings of the COLING/ACL 2006 interactive presentation sessions, 2006, pp. 69–72. doi: 10.3115/1225403.1225421.

[27]

M. D. Bloice, C. Stocker, and A. Holzinger, “Augmentor: An image augmentation library for machine learning,” Journal of Open Source Software, vol. 2, no. 19, p. 432, 2017, doi: 10.21105/joss.00432.

[28]

L. Breiman, Classification and Regression Trees. New York: Routledge, 2017. doi: 10.1201/9781315139470.

[29]

L. Breiman, “Random Forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, Oct. 2001, doi: 10.1023/A:1010933404324.

[30]

“The California Almond.” Accessed: Feb. 16, 2024. [Online]. Available: https://www.waterfordnut.com/almond.html

[31]

D. Carpentras, “We urgently need a culture of multi-operationalization in psychological research,” Communications Psychology, vol. 2, no. 1, p. 32, 2024.

[32]

R. Chalapathy and S. Chawla, “Deep learning for anomaly detection: A survey,” arXiv preprint arXiv:1901.03407, 2019.

[33]

G. Chandrashekar and F. Sahin, “A survey on feature selection methods,” Computers & electrical engineering, vol. 40, no. 1, pp. 16–28, 2014, doi: 10.1016/j.compeleceng.2013.11.024.

[34]

K. Chasalow and K. Levy, “Representativeness in Statistics, Politics, and Machine Learning,” in Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, in FAccT ’21. New York, NY, USA: Association for Computing Machinery, Mar. 2021, pp. 77–89. doi: 10.1145/3442188.3445872.

[35]

P. Chattopadhyay, R. Vedantam, R. R. Selvaraju, D. Batra, and D. Parikh, “Counting everyday objects in everyday scenes,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1135–1144. doi: 10.1109/cvpr.2017.471.

[36]

V. Chernozhukov et al., “Double/debiased machine learning for treatment and structural parameters.” Oxford University Press Oxford, UK, 2018. doi: 10.1111/ectj.12097.

[37]

N. Chomsky, I. Roberts, and J. Watumull, “Noam chomsky: The false promise of ChatGPT,” The New York Times, vol. 8, 2023.

[38]

G. Ciravegna, F. Precioso, A. Betti, K. Mottin, and M. Gori, “Knowledge-driven active learning,” in Joint european conference on machine learning and knowledge discovery in databases, Springer, 2023, pp. 38–54. doi: 10.1007/978-3-031-43412-9_3.

[39]

L. H. Clemmensen and R. D. Kjærsgaard, “Data Representativity for Machine Learning and AI Systems.” arXiv, Feb. 2023. Accessed: Feb. 07, 2024. [Online]. Available: http://arxiv.org/abs/2203.04706

[40]

G. S. Collins et al., “TRIPOD+AI statement: Updated guidance for reporting clinical prediction models that use regression or machine learning methods,” BMJ (Clinical research ed.), vol. 385, 2024, doi: 10.1136/bmj-2023-078378.

[41]

I. C. Covert, S. Lundberg, and S.-I. Lee, “Understanding global feature contributions with additive importance measures,” in Proceedings of the 34th International Conference on Neural Information Processing Systems, in NIPS’20. Red Hook, NY, USA: Curran Associates Inc., Dec. 2020, pp. 17212–17223.

[42]

G. Corso, H. Stark, S. Jegelka, T. Jaakkola, and R. Barzilay, “Graph neural networks,” Nature Reviews Methods Primers, vol. 4, no. 1, p. 17, 2024.

[43]

K. Cranmer, J. Brehmer, and G. Louppe, “The frontier of simulation-based inference,” Proceedings of the National Academy of Sciences, vol. 117, no. 48, pp. 30055–30062, 2020, doi: 10.1073/pnas.1912789117.

[44]

G. Cybenko, “Approximation by superpositions of a sigmoidal function,” Mathematics of control, signals and systems, vol. 2, no. 4, pp. 303–314, 1989, doi: 10.1007/BF02551274.

[45]

A. Curth, A. Jeffares, and M. van der Schaar, “A u-turn on double descent: Rethinking parameter counting in statistical learning,” Advances in Neural Information Processing Systems, vol. 36, 2024.

[46]

O. D’Ecclesiis et al., “Vitamin d and SARS-CoV2 infection, severity and mortality: A systematic review and meta-analysis,” PLoS One, vol. 17, no. 7, p. e0268396, 2022, doi: 10.1371/journal.pone.0268396.

[47]

S. Dandl, “Causality concepts in machine learning: Heterogeneous treatment effect estimation with machine learning & model interpretation with counterfactual and semi-factual explanations,” PhD thesis, lmu, 2023. doi: 10.5282/edoc.32947.

[48]

T. Danka and P. Horvath, “modAL: A modular active learning framework for python,” arXiv preprint arXiv:1805.00979, 2018.

[49]

H. W. De Regt, “Understanding, values, and the aims of science,” Philosophy of Science, vol. 87, no. 5, pp. 921–932, 2020, doi: 10.1086/710520.

[50]

B. Dennis, J. M. Ponciano, M. L. Taper, and S. R. Lele, “Errors in statistical inference under model misspecification: Evidence, hypothesis testing, and AIC,” Frontiers in Ecology and Evolution, vol. 7, p. 372, 2019, doi: 10.3389/fevo.2019.00372.

[51]

T. Denouden, R. Salay, K. Czarnecki, V. Abdelzad, B. Phan, and S. Vernekar, “Improving reconstruction autoencoder out-of-distribution detection with mahalanobis distance,” arXiv preprint arXiv:1812.02765, 2018.

[52]

A. Diamantopoulos, P. Riefler, and K. P. Roth, “Advancing formative measurement models,” Journal of business research, vol. 61, no. 12, pp. 1203–1218, 2008, doi: 10.1016/j.jbusres.2008.01.009.

[53]

T. J. Diciccio and J. P. Romano, “A review of bootstrap confidence intervals,” Journal of the Royal Statistical Society Series B: Statistical Methodology, vol. 50, no. 3, pp. 338–354, 1988.

[54]

P. Domingos, “A unified bias-variance decomposition,” in Proceedings of 17th international conference on machine learning, Morgan Kaufmann Stanford, 2000, pp. 231–238.

[55]

A. Sharma and E. Kiciman, “DoWhy: An end-to-end library for causal inference,” arXiv preprint arXiv:2011.04216, 2020.

[56]

F. Doshi-Velez and B. Kim, “Towards A Rigorous Science of Interpretable Machine Learning.” arXiv, Mar. 2017. Accessed: Dec. 01, 2023. [Online]. Available: http://arxiv.org/abs/1702.08608

[57]

Bach, V. Chernozhukov, M. S. Kurz, and M. Spindler, “DoubleML – An object-oriented implementation of double machine learning in Python,” Journal of Machine Learning Research, vol. 23, no. 53, pp. 1–6, 2022, doi: 10.18637/jss.v108.i03.

[58]

Bach, V. Chernozhukov, M. S. Kurz, and M. Spindler, “DoubleML – An object-oriented implementation of double machine learning in R.” 2021. doi: 10.32614/cran.package.doubleml.

[59]

H. E. Douglas, “Reintroducing prediction to explanation,” Philosophy of Science, vol. 76, no. 4, pp. 444–463, 2009, doi: 10.1086/648111.

[60]

F. Eberhardt, C. Glymour, and R. Scheines, “On the number of experiments sufficient and in the worst case necessary to identify all causal relations among n variables,” in Proceedings of the twenty-first conference on uncertainty in artificial intelligence, in UAI’05. Arlington, Virginia, USA: AUAI Press, 2005, pp. 178–184.

[61]

J. Earman, “Bayes or bust? A critical examination of bayesian confirmation theory.” MIT Press, 1992.

[62]

J. Ellenberg, How not to be wrong: The hidden maths of everyday life. Penguin UK, 2014.

[63]

B. J. Erickson, P. Korfiatis, Z. Akkus, and T. L. Kline, “Machine learning for medical imaging,” Radiographics, vol. 37, no. 2, pp. 505–515, 2017, doi: 10.1148/rg.2017160130.

[64]

S. Feng et al., “A survey of data augmentation approaches for NLP,” in Findings of the association for computational linguistics: ACL-IJCNLP 2021, Association for Computational Linguistics, 2021. doi: 10.18653/v1/2021.findings-acl.84.

[65]

A. Fisher, C. Rudin, and F. Dominici, “All Models are Wrong, but Many are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously,” Journal of machine learning research : JMLR, vol. 20, p. 177, 2019, Accessed: Jan. 16, 2024. [Online]. Available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8323609/

[66]

A. Fischer, “Resistance of children to covid-19. how?” Mucosal Immunology, vol. 13, no. 4, pp. 563–565, 2020, doi: 10.1038/s41385-020-0303-9.

[67]

S. W. Fleming, V. V. Vesselinov, and A. G. Goodbody, “Augmenting geophysical interpretation of data-driven operational water supply forecast modeling for a western US river using a hybrid machine learning approach,” Journal of Hydrology, vol. 597, p. 126327, 2021.

[68]

M. Flora, C. Potvin, A. McGovern, and S. Handler, “Comparing Explanation Methods for Traditional Machine Learning Models Part 1: An Overview of Current Methods and Quantifying Their Disagreement.” arXiv, Nov. 2022. doi: 10.48550/arXiv.2211.08943.

[69]

J. Frankle and M. Carbin, “The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks.” arXiv, Mar. 2019. doi: 10.48550/arXiv.1803.03635.

[70]

T. Freiesleben, G. König, C. Molnar, and Á. Tejero-Cantero, “Scientific inference with interpretable machine learning: Analyzing models to learn about real-world phenomena,” Minds and Machines, vol. 34, no. 3, p. 32, 2024, doi: 10.1007/s11023-024-09691-z.

[71]

T. Freiesleben, “Artificial neural nets and the representation of human concepts,” arXiv preprint arXiv:2312.05337, 2023, doi: 10.48550/arXiv.2312.05337.

[72]

T. Freiesleben and T. Grote, “Beyond generalization: A theory of robustness in machine learning,” Synthese, vol. 202, no. 4, p. 109, 2023, doi: 10.1007/s11229-023-04334-9.

[73]

T. Freiesleben, “The intriguing relation between counterfactual explanations and adversarial examples,” Minds and Machines, vol. 32, no. 1, pp. 77–109, 2022, doi: 10.1007/s11023-021-09580-9.

[74]

J. H. Friedman, “Greedy function approximation: A gradient boosting machine.” The Annals of Statistics, vol. 29, no. 5, pp. 1189–1232, Oct. 2001, doi: 10.1214/aos/1013203451.

[75]

J. H. Friedman and B. E. Popescu, “Predictive Learning via Rule Ensembles,” The Annals of Applied Statistics, vol. 2, no. 3, pp. 916–954, 2008, doi: 10.1214/07-AOAS148.

[76]

K. Fukumizu, A. Gretton, X. Sun, and B. Schölkopf, “Kernel measures of conditional dependence,” Advances in neural information processing systems, vol. 20, 2007.

[77]

Y. Gal, “Uncertainty in deep learning,” 2016.

[78]

Y. Gal and Z. Ghahramani, “Dropout as a bayesian approximation: Representing model uncertainty in deep learning,” in International conference on machine learning, PMLR, 2016, pp. 1050–1059.

[79]

Y. Gao et al., “Machine learning based early warning system enables accurate mortality risk prediction for COVID-19,” Nature communications, vol. 11, no. 1, p. 5033, 2020, doi: 10.5281/zenodo.3991113.

[80]

T. Gebru et al., “Datasheets for datasets,” Communications of the ACM, vol. 64, no. 12, pp. 86–92, 2021, doi: 10.1145/3458723.

[81]

B. Ghai, Q. V. Liao, Y. Zhang, R. Bellamy, and K. Mueller, “Explainable active learning (xal) toward ai explanations as interfaces for machine teachers,” Proceedings of the ACM on Human-Computer Interaction, vol. 4, no. CSCW3, pp. 1–28, 2021, doi: 10.1145/3432934.

[82]

J. B. Gibbons et al., “Association between vitamin d supplementation and COVID-19 infection and mortality,” Scientific Reports, vol. 12, no. 1, p. 19397, 2022, doi: 10.1038/s41598-022-24053-4.

[83]

L. S. Gottfredson, “The general intelligence factor.” Scientific American, Incorporated, 1998.

[84]

I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. MIT press, 2016.

[85]

A. Goldstein, A. Kapelner, J. Bleich, and E. Pitkin, “Peeking Inside the Black Box: Visualizing Statistical Learning With Plots of Individual Conditional Expectation,” Journal of Computational and Graphical Statistics, vol. 24, no. 1, pp. 44–65, Jan. 2015, doi: 10.1080/10618600.2014.907095.

[86]

Y. Gong and G. Zhao, “Wealth, health, and beyond: Is COVID-19 less likely to spread in rich neighborhoods?” Plos one, vol. 17, no. 5, p. e0267487, 2022, doi: 10.1371%2Fjournal.pone.0267487.

[87]

I. Goodfellow et al., “Generative adversarial nets,” Advances in neural information processing systems, vol. 27, 2014.

[88]

I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” arXiv preprint arXiv:1412.6572, 2014, doi: 10.48550/arXiv.1412.6572.

[89]

J. Goschenhofer, F. M. Pfister, K. A. Yuksel, B. Bischl, U. Fietzek, and J. Thomas, “Wearable-based parkinson’s disease severity monitoring using deep learning,” in Machine learning and knowledge discovery in databases: European conference, ECML PKDD 2019, würzburg, germany, september 16–20, 2019, proceedings, part III, Springer, 2020, pp. 400–415. doi: 10.1007/978-3-030-46133-1_24.

[90]

C. Gruber, P. O. Schenk, M. Schierholz, F. Kreuter, and G. Kauermann, “Sources of Uncertainty in Machine Learning – A Statisticians’ View.” arXiv, May 2023. doi: 10.48550/arXiv.2305.16703.

[91]

M. Gupta et al., “CryoEM and AI reveal a structure of SARS-CoV-2 Nsp2, a multifunctional protein involved in key host processes.” Research Square, 2021, doi: 10.1101/2021.05.10.443524.

[92]

P. J. Haley and D. Soloway, “Extrapolation limitations of multilayer feedforward neural networks,” in [Proceedings 1992] IJCNN international joint conference on neural networks, IEEE, 1992, pp. 25–30. doi: 10.1109/IJCNN.1992.227294.

[93]

P. R. Halmos, Measure theory, vol. 18. Springer, 2013. doi: 10.1007/978-1-4684-9440-2.

[94]

S. Han, C. Lin, C. Shen, Q. Wang, and X. Guan, “Interpreting adversarial examples in deep learning: A review,” ACM Computing Surveys, vol. 55, no. 14s, pp. 1–38, 2023, doi: 10.1145/3594869.

[95]

M. Hardt and B. Recht, Patterns, predictions, and actions: Foundations of machine learning. Princeton University Press, 2022.

[96]

U. Hasson, S. A. Nastase, and A. Goldstein, “Direct fit to nature: An evolutionary perspective on biological and artificial neural networks,” Neuron, vol. 105, no. 3, pp. 416–434, 2020, doi: 10.1016/j.neuron.2019.12.002.

[97]

T. Hastie, R. Tibshirani, J. H. Friedman, and J. H. Friedman, The elements of statistical learning: Data mining, inference, and prediction, vol. 2. Springer, 2009.

[98]

Y.-H. He, “Machine-learning the string landscape,” Physics Letters B, vol. 774, pp. 564–568, 2017, doi: 10.1016/j.physletb.2017.10.024.

[99]

D. Hendrycks et al., “The many faces of robustness: A critical analysis of out-of-distribution generalization,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 8340–8349. doi: 10.1109/ICCV48922.2021.00823.

[100]

B. Hofner, A. Mayr, N. Robinzonov, and M. Schmid, “Model-based boosting in r: A hands-on tutorial using the r package mboost,” Computational statistics, vol. 29, pp. 3–35, 2014, doi: 10.1007/s00180-012-0382-5.

[101]

S. Hoffmann, F. Schönbrodt, R. Elsas, R. Wilson, U. Strasser, and A.-L. Boulesteix, “The multiplicity of analysis strategies jeopardizes replicability: Lessons learned across disciplines,” Royal Society Open Science, vol. 8, no. 4, p. 201925, 2021, doi: 10.1098/rsos.201925.

[102]

P. W. Holland, “Statistics and causal inference,” Journal of the American statistical Association, vol. 81, no. 396, pp. 945–960, 1986, doi: 10.2307/2289064.

[103]

K. Hornik, “Approximation capabilities of multilayer feedforward networks,” Neural networks, vol. 4, no. 2, pp. 251–257, 1991, doi: 10.1016/0893-6080(91)90009-T.

[104]

J. Howard et al., “An evidence review of face masks against COVID-19,” Proceedings of the National Academy of Sciences, vol. 118, no. 4, p. e2014564118, 2021, doi: 10.1073/pnas.2014564118.

[105]

W. Hu et al., “Open graph benchmark: Datasets for machine learning on graphs,” Advances in neural information processing systems, vol. 33, pp. 22118–22133, 2020, doi: doi/10.5555/3495724.3497579.

[106]

S. Hu et al., “Weakly supervised deep learning for covid-19 infection detection and classification from ct images,” IEEE Access, vol. 8, pp. 118869–118883, 2020, doi: 10.1109/access.2020.3005510.

[107]

E. Hüllermeier and W. Waegeman, “Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods,” Machine Learning, vol. 110, no. 3, pp. 457–506, Mar. 2021, doi: 10.1007/s10994-021-05946-3.

[108]

F. Hutter, L. Kotthoff, and J. Vanschoren, Automated machine learning: Methods, systems, challenges. Springer Nature, 2019. doi: 10.1007/978-3-030-05318-5.

[109]

I. London, “Encoding cyclical continuous features - 24-hour time,” Ian London’s Blog. Jul. 2016. Accessed: Sep. 27, 2023. [Online]. Available: https://ianlondon.github.io/posts/encoding-cyclical-features-24-hour-time/

[110]

A. Ilyas, S. Santurkar, D. Tsipras, L. Engstrom, B. Tran, and A. Madry, “Adversarial examples are not bugs, they are features,” Advances in neural information processing systems, vol. 32, 2019.

[111]

P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1125–1134. doi: 10.1109/CVPR.2017.632.

[112]

B. Jalaian, M. Lee, and S. Russell, “Uncertain context: Uncertainty quantification in machine learning,” AI Magazine, vol. 40, no. 4, pp. 40–49, 2019, doi: 10.1609/aimag.v40i4.4812 .

[113]

L. Jehi et al., “Individualizing risk prediction for positive coronavirus disease 2019 testing: Results from 11,672 patients,” Chest, vol. 158, no. 4, pp. 1364–1375, 2020, doi: 10.1016/j.chest.2020.05.580.

[114]

J. Jumper et al., “Highly accurate protein structure prediction with AlphaFold,” Nature, vol. 596, no. 7873, pp. 583–589, 2021, doi: 10.1038/s41586-021-03819-2.

[115]

M. Kalisch, M. Mächler, D. Colombo, M. H. Maathuis, and P. Bühlmann, “Causal inference using graphical models with the r package pcalg,” Journal of statistical software, vol. 47, pp. 1–26, 2012, doi: 10.18637/jss.v047.i11.

[116]

P. Kamath, A. Tangella, D. Sutherland, and N. Srebro, “Does invariant risk minimization capture invariance?” in International conference on artificial intelligence and statistics, PMLR, 2021, pp. 4069–4077.

[117]

S. Kapoor et al., “REFORMS: Consensus-based Recommendations for Machine-learning-based Science,” Science Advances, vol. 10, no. 18, p. eadk3452, May 2024, doi: 10.1126/sciadv.adk3452.

[118]

E. Kasneci et al., “ChatGPT for good? On opportunities and challenges of large language models for education,” Learning and individual differences, vol. 103, p. 102274, 2023.

[119]

A. J. Kell, D. L. Yamins, E. N. Shook, S. V. Norman-Haignere, and J. H. McDermott, “A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy,” Neuron, vol. 98, no. 3, pp. 630–644, 2018, doi: 10.1016/j.neuron.2018.03.044.

[120]

A. Kendall and Y. Gal, “What uncertainties do we need in bayesian deep learning for computer vision?” Advances in neural information processing systems, vol. 30, 2017.

[121]

J. Kim and V. Pavlovic, “Ancient coin recognition based on spatial coding,” in 2014 22nd international conference on pattern recognition, 2014, pp. 321–326. doi: 10.1109/ICPR.2014.64.

[122]

M. C. Knaus, “Double machine learning-based programme evaluation under unconfoundedness,” The Econometrics Journal, vol. 25, no. 3, pp. 602–627, Jun. 2022, doi: 10.1093/ectj/utac015.

[123]

D. Kobak, R. G. Márquez, E.-Á. Horvát, and J. Lause, “Delving into ChatGPT usage in academic writing through excess vocabulary,” arXiv preprint arXiv:2406.07016, 2024.

[124]

P. W. Koh et al., “Concept bottleneck models,” in International conference on machine learning, PMLR, 2020, pp. 5338–5348.

[125]

G. König, T. Freiesleben, and M. Grosse-Wentrup, “Improvement-focused causal recourse (ICR),” in Proceedings of the AAAI conference on artificial intelligence, 2023, pp. 11847–11855. doi: 10.1609/aaai.v37i10.26398.

[126]

M. Krenn, M. Malik, R. Fickler, R. Lapkiewicz, and A. Zeilinger, “Automated search for new quantum experiments,” Physical review letters, vol. 116, no. 9, p. 090405, 2016.

[127]

T. S. Kuhn, The structure of scientific revolutions, vol. 962. University of Chicago press Chicago, 1997.

[128]

S. R. Künzel, J. S. Sekhon, P. J. Bickel, and B. Yu, “Metalearners for estimating heterogeneous treatment effects using machine learning,” Proceedings of the national academy of sciences, vol. 116, no. 10, pp. 4156–4165, 2019, doi: 10.1073/pnas.1804597116.

[129]

R. Lagerquist, A. McGovern, C. R. Homeyer, D. J. Gagne II, and T. Smith, “Deep learning on three-dimensional multiscale data for next-hour tornado prediction,” Monthly Weather Review, vol. 148, no. 7, pp. 2837–2861, 2020, doi: 10.1175/MWR-D-19-0372.1.

[130]

R. Lam et al., “Learning skillful medium-range global weather forecasting,” Science, p. eadi2336, 2023, doi: 10.1126/science.adi2336.

[131]

J. Lei, M. G’Sell, A. Rinaldo, R. J. Tibshirani, and L. Wasserman, “Distribution-Free Predictive Inference for Regression,” Journal of the American Statistical Association, vol. 113, no. 523, pp. 1094–1111, Jul. 2018, doi: 10.1080/01621459.2017.1307116.

[132]

L. Lei and E. J. Candès, “Conformal inference of counterfactuals and individual treatment effects,” Journal of the Royal Statistical Society Series B: Statistical Methodology, vol. 83, no. 5, pp. 911–938, 2021, doi: 10.1111/rssb.12445.

[133]

S. Letzgus and K.-R. Müller, “An explainable AI framework for robust and transparent data-driven wind turbine power curve models,” Energy and AI, p. 100328, Dec. 2023, doi: 10.1016/j.egyai.2023.100328.

[134]

Z. C. Lipton, “The Mythos of Model Interpretability.” arXiv, Mar. 2017. doi: 10.48550/arXiv.1606.03490.

[135]

C. List, “Levels: Descriptive, explanatory, and ontological,” Noûs, vol. 53, no. 4, pp. 852–883, 2019.

[136]

Y. Lu and J. Lu, “A universal approximation theorem of deep neural networks for expressing probability distributions,” Advances in neural information processing systems, vol. 33, pp. 3094–3105, 2020, doi: 10.5555/3495724.3495984.

[137]

T. C. Lucas, “A translucent box: Interpretable machine learning in ecology,” Ecological Monographs, vol. 90, no. 4, p. e01422, 2020, doi: 10.1002/ecm.1422.

[138]

S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, in NIPS’17. Red Hook, NY, USA: Curran Associates Inc., Dec. 2017, pp. 4768–4777. doi: 10.5555/3295222.3295230.

[139]

K. Maaz et al., Bildung in deutschland 2022: Ein indikatorengestützter bericht mit einer analyse zum bildungspersonal. wbv Publikation, 2022. doi: 10.3278/6001820hw.

[140]

G. Marcus, “The next decade in AI: Four steps towards robust artificial intelligence,” arXiv preprint arXiv:2002.06177, 2020, doi: 10.48550/arXiv.2002.06177.

[141]

C. F. Manski, Partial identification of probability distributions, vol. 5. Springer, 2003. doi: 10.1007/b97478.

[142]

P. Maurage, A. Heeren, and M. Pesenti, “Does chocolate consumption really boost nobel award chances? The peril of over-interpreting correlations in health studies,” The Journal of Nutrition, vol. 143, no. 6, pp. 931–933, 2013, doi: 10.3945/jn.113.174813.

[143]

M. B. McDermott, S. Wang, N. Marinsek, R. Ranganath, L. Foschini, and M. Ghassemi, “Reproducibility in machine learning for health research: Still a ways to go,” Science Translational Medicine, vol. 13, no. 586, p. eabb1655, 2021, doi: 10.1126/scitranslmed.abb1655.

[144]

L. Mentch and G. Hooker, “Quantifying uncertainty in random forests via confidence intervals and hypothesis tests,” Journal of Machine Learning Research, vol. 17, no. 26, pp. 1–41, 2016.

[145]

T. Miller, “Explanation in artificial intelligence: Insights from the social sciences,” Artificial Intelligence, vol. 267, pp. 1–38, Feb. 2019, doi: 10.1016/j.artint.2018.07.007.

[146]

M. Mitchell et al., “Model Cards for Model Reporting,” in Proceedings of the Conference on Fairness, Accountability, and Transparency, in FAT* ’19. New York, NY, USA: Association for Computing Machinery, Jan. 2019, pp. 220–229. doi: 10.1145/3287560.3287596.

[147]

C. Molnar, G. Casalicchio, and B. Bischl, “Quantifying model complexity via functional decomposition for better post-hoc interpretability,” in Machine learning and knowledge discovery in databases: International workshops of ECML PKDD 2019, würzburg, germany, september 16–20, 2019, proceedings, part i, Springer, 2020, pp. 193–204. doi: 10.1007/978-3-030-43823-4_17.

[148]

C. Molnar, Interpretable machine learning: A guide for making black box models explainable, 2nd ed. 2022. Available: https://christophm.github.io/interpretable-ml-book

[149]

C. Molnar, G. König, B. Bischl, and G. Casalicchio, “Model-agnostic Feature Importance and Effects with Dependent Features – A Conditional Subgroup Approach,” Data Mining and Knowledge Discovery, Jan. 2023, doi: 10.1007/s10618-022-00901-9.

[150]

C. Molnar et al., “Relating the Partial Dependence Plot and Permutation Feature Importance to the Data Generating Process,” in Explainable Artificial Intelligence, L. Longo, Ed., in Communications in Computer and Information Science. Cham: Springer Nature Switzerland, 2023, pp. 456–479. doi: 10.1007/978-3-031-44064-9_24.

[151]

M. Moor et al., “Foundation models for generalist medical artificial intelligence,” Nature, vol. 616, no. 7956, pp. 259–265, 2023.

[152]

A. Mumuni and F. Mumuni, “CNN architectures for geometric transformation-invariant feature representation in computer vision: A review,” SN Computer Science, vol. 2, no. 5, p. 340, 2021, doi: 10.1007/s42979-021-00735-0.

[153]

A. Mumuni and F. Mumuni, “Data augmentation: A comprehensive survey of modern approaches,” Array, vol. 16, p. 100258, 2022, doi: 10.1016/j.array.2022.100258.

[154]

C. Nadeau and Y. Bengio, “Inference for the generalization error,” Advances in neural information processing systems, vol. 12, 1999.

[155]

B. Neal, “Introduction to causal inference,” Course Lecture Notes (draft), 2020.

[156]

M. Neethu and R. Rajasree, “Sentiment analysis in twitter using machine learning techniques,” in 2013 fourth international conference on computing, communications and networking technologies (ICCCNT), IEEE, 2013, pp. 1–5. doi: 10.1109/ICCCNT.2013.6726818.

[157]

T. Neupert, M. H. Fischer, E. Greplova, K. Choo, and M. M. Denner, “Introduction to machine learning for the sciences,” arXiv preprint arXiv:2102.04883, 2021, doi: 10.48550/arXiv.2102.04883.

[158]

A. Nguyen and M. R. Martı́nez, “MonoNet: Towards Interpretable Models by Learning Monotonic Features.” arXiv, Sep. 2019. doi: 10.48550/arXiv.1909.13611.

[159]

M. S. Norouzzadeh et al., “Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning,” Proceedings of the National Academy of Sciences, vol. 115, no. 25, pp. E5716–E5725, 2018, doi: 10.1073/pnas.1719367115.

[160]

T. Oikarinen et al., “Deep convolutional network for animal sound classification and source attribution using dual audio recordings,” The Journal of the Acoustical Society of America, vol. 145, no. 2, pp. 654–662, 2019, doi: 10.1121/1.5097583.

[161]

O. S. Collaboration, “Estimating the reproducibility of psychological science,” Science, vol. 349, no. 6251, p. aac4716, 2015, doi: 10.1126/science.aac4716.

[162]

E. Ozoani, M. Gerchick, and M. Mitchell, Model card guidebook. Hugging Face, 2022. Available: https://huggingface.co/docs/hub/en/model-card-guidebook

[163]

J. Pearl and D. Mackenzie, The book of why: The new science of cause and effect. Basic books, 2018.

[164]

J. Pearl, Causality. Cambridge university press, 2009.

[165]

J. Pearl, “The limitations of opaque learning machines,” Possible minds, vol. 25, pp. 13–19, 2019.

[166]

N. Papernot, P. McDaniel, and I. Goodfellow, “Transferability in machine learning: From phenomena to black-box attacks using adversarial samples,” arXiv preprint arXiv:1605.07277, 2016, doi: 10.48550/arXiv.1605.07277.

[167]

M. A. Pedersen, “Editorial introduction: Towards a machinic anthropology,” Big Data & Society, vol. 10. SAGE Publications Sage UK: London, England, p. 20539517231153803, 2023. doi: 10.1177/20539517231153803.

[168]

J. Pereira, A. J. Simpkin, M. D. Hartmann, D. J. Rigden, R. M. Keegan, and A. N. Lupas, “High-accuracy protein structure prediction in CASP14,” Proteins: Structure, Function, and Bioinformatics, vol. 89, no. 12, pp. 1687–1699, 2021, doi: 10.1002/prot.26171.

[169]

G. L. Perry, R. Seidl, A. M. Bellvé, and W. Rammer, “An outlook for deep learning in ecosystem science,” Ecosystems, pp. 1–19, 2022, doi: 10.1007/s10021-022-00789-y.

[170]

J. Peters, D. Janzing, and B. Schölkopf, Elements of causal inference: Foundations and learning algorithms. The MIT Press, 2017.

[171]

F. Pfisterer, S. Coors, J. Thomas, and B. Bischl, “Multi-objective automatic machine learning with autoxgboostmc,” arXiv preprint arXiv:1908.10796, 2019, doi: 10.48550/arXiv.1908.10796.

[172]

J. Platt et al., “Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods,” Advances in large margin classifiers, vol. 10, no. 3, pp. 61–74, 1999.

[173]

K. Popper, The logic of scientific discovery. Routledge, 2005.

[174]

Z. Pu and E. Kalnay, “Numerical weather prediction basics: Models, numerical methods, and data assimilation,” Handbook of hydrometeorological ensemble forecasting, pp. 67–97, 2019.

[175]

M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Physics Informed Deep Learning (Part I): Data-driven Solutions of Nonlinear Partial Differential Equations.” arXiv, Nov. 2017. doi: 10.48550/arXiv.1711.10561.

[176]

M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Physics Informed Deep Learning (Part II): Data-driven Discovery of Nonlinear Partial Differential Equations.” arXiv, Nov. 2017. doi: 10.48550/arXiv.1711.10566.

[177]

P. Rajpurkar et al., “CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning.” arXiv, Dec. 2017. doi: 10.48550/arXiv.1711.05225.

[178]

S.-A. Rebuffi, S. Gowal, D. A. Calian, F. Stimberg, O. Wiles, and T. A. Mann, “Data augmentation can improve robustness,” Advances in Neural Information Processing Systems, vol. 34, pp. 29935–29948, 2021, doi: 10.48550/arXiv.2111.05328.

[179]

M. Reichstein et al., “Deep learning and process understanding for data-driven earth system science,” Nature, vol. 566, no. 7743, pp. 195–204, 2019, doi: 10.1038/s41586-019-0912-1.

[180]

P. Ren et al., “A survey of deep active learning,” ACM computing surveys (CSUR), vol. 54, no. 9, pp. 1–40, 2021, doi: 10.1145/3472291.

[181]

X. Ren et al., “Deep learning-based weather prediction: A survey,” Big Data Research, vol. 23, p. 100178, 2021, doi: 10.1016/j.bdr.2020.100178.

[182]

M. T. Ribeiro, S. Singh, and C. Guestrin, “Anchors: High-Precision Model-Agnostic Explanations,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, Apr. 2018, doi: 10.1609/aaai.v32i1.11491.

[183]

M. T. Ribeiro, S. Singh, and C. Guestrin, “"Why Should I Trust You?": Explaining the Predictions of Any Classifier,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, in KDD ’16. New York, NY, USA: Association for Computing Machinery, Aug. 2016, pp. 1135–1144. doi: 10.1145/2939672.2939778.

[184]

M. Roberts et al., “Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans,” Nature Machine Intelligence, vol. 3, no. 3, pp. 199–217, 2021, doi: 10.1038/s42256-021-00307-0.

[185]

J. W. Rocks and P. Mehta, “Memorizing without overfitting: Bias, variance, and interpolation in overparameterized models,” Physical review research, vol. 4, no. 1, p. 013201, 2022, doi: 10.1103/PhysRevResearch.4.013201.

[186]

Y. Romano, E. Patterson, and E. Candes, “Conformalized quantile regression,” Advances in neural information processing systems, vol. 32, 2019.

[187]

Y. Romano, M. Sesia, and E. Candes, “Classification with valid and adaptive coverage,” Advances in Neural Information Processing Systems, vol. 33, pp. 3581–3591, 2020.

[188]

R. Roscher, B. Bohn, M. F. Duarte, and J. Garcke, “Explainable Machine Learning for Scientific Insights and Discoveries,” IEEE Access, vol. 8, pp. 42200–42216, 2020, doi: 10.1109/ACCESS.2020.2976199.

[189]

J. Rothfuss, F. Ferreira, S. Walther, and M. Ulrich, “Conditional density estimation with neural networks: Best practices and benchmarks,” arXiv preprint arXiv:1903.00954, 2019, doi: 10.48550/arXiv.1903.00954.

[190]

C. Rudin, C. Chen, Z. Chen, H. Huang, L. Semenova, and C. Zhong, “Interpretable machine learning: Fundamental principles and 10 grand challenges,” Statistics Surveys, vol. 16, no. none, pp. 1–85, Jan. 2022, doi: 10.1214/21-SS133.

[191]

L. Ruff et al., “A unifying review of deep and shallow anomaly detection,” Proceedings of the IEEE, vol. 109, no. 5, pp. 756–795, 2021, doi: 10.1109/JPROC.2021.3052449.

[192]

R. Schaeffer et al., “Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle.” arXiv, Mar. 2023. doi: 10.48550/arXiv.2303.14151.

[193]

J. Schmidt, M. R. Marques, S. Botti, and M. A. Marques, “Recent advances and applications of machine learning in solid-state materials science,” npj Computational Materials, vol. 5, no. 1, pp. 1–36, 2019, doi: 10.1038/s41524-019-0221-0.

[194]

C. A. Scholbeck, C. Molnar, C. Heumann, B. Bischl, and G. Casalicchio, “Sampling, Intervention, Prediction, Aggregation: A Generalized Framework for Model-Agnostic Interpretations,” in Machine Learning and Knowledge Discovery in Databases, P. Cellier and K. Driessens, Eds., in Communications in Computer and Information Science. Cham: Springer International Publishing, 2020, pp. 205–216. doi: 10.1007/978-3-030-43823-4_18.

[195]

B. Schölkopf, D. Janzing, J. Peters, E. Sgouritsa, K. Zhang, and J. Mooij, “On causal and anticausal learning,” arXiv preprint arXiv:1206.6471, 2012, doi: 10.48550/arXiv.1206.6471.

[196]

B. Schölkopf et al., “Toward causal representation learning,” Proceedings of the IEEE, vol. 109, no. 5, pp. 612–634, 2021, doi: 10.1109/JPROC.2021.3058954.

[197]

B. Schölkopf, “Causality for machine learning,” in Probabilistic and causal inference: The works of judea pearl, 2022, pp. 765–804.

[198]

H. Seibold, 6 steps towards reproducible research. Zenodo, 2024. doi: 10.5281/zenodo.12744715.

[199]

R. Sen, A. T. Suresh, K. Shanmugam, A. G. Dimakis, and S. Shakkottai, “Model-powered conditional independence test,” Advances in neural information processing systems, vol. 30, 2017, doi: 10.5555/3294996.3295055.

[200]

L. Semenova, H. Chen, R. Parr, and C. Rudin, “A path to simpler models starts with noise,” Advances in neural information processing systems, vol. 36, 2024.

[201]

B. Settles, “Active learning literature survey,” University of Wisconsin–Madison, Computer Sciences Technical Report 1648, 2009.

[202]

S. Shalev-Shwartz and S. Ben-David, Understanding machine learning: From theory to algorithms. Cambridge university press, 2014. doi: 10.1017/CBO9781107298019.

[203]

R. D. Shah and J. Peters, “The hardness of conditional independence testing and the generalised covariance measure,” The Annals of Statistics, vol. 48, no. 3, pp. 1514–1538, 2020, doi: 10.1214/19-AOS1857.

[204]

C. Shorten and T. M. Khoshgoftaar, “A survey on image data augmentation for deep learning,” Journal of big data, vol. 6, no. 1, pp. 1–48, 2019, doi: 10.1186/s40537-019-0197-0.

[205]

P. Spirtes, C. N. Glymour, and R. Scheines, Causation, prediction, and search. MIT press, 2000. doi: 10.1007/978-1-4612-2748-9.

[206]

R. Frigg and S. Hartmann, “Models in Science,” in The Stanford encyclopedia of philosophy, Spring 2020., E. N. Zalta, Ed., https://plato.stanford.edu/archives/spr2020/entries/models-science/; Metaphysics Research Lab, Stanford University, 2020.

[207]

C. Hitchcock and M. Rédei, “Reichenbach’s Common Cause Principle,” in The Stanford encyclopedia of philosophy, Summer 2021., E. N. Zalta, Ed., https://plato.stanford.edu/archives/sum2021/entries/physics-Rpcc/; Metaphysics Research Lab, Stanford University, 2021.

[208]

A. Chakravartty, “Scientific Realism,” in The Stanford encyclopedia of philosophy, Summer 2017., E. N. Zalta, Ed., https://plato.stanford.edu/archives/sum2017/entries/scientific-realism/; Metaphysics Research Lab, Stanford University, 2017.

[209]

G. Shmueli, “To Explain or to Predict?” Statistical Science, vol. 25, no. 3, pp. 289–310, 2010, doi: 10.1214/10-STS330.

[210]

T. F. Sterkenburg and P. D. Grünwald, “The no-free-lunch theorems of supervised learning,” Synthese, vol. 199, no. 3, pp. 9979–10015, 2021, doi: 10.1007/s11229-021-03233-1.

[211]

C. Strobl, A.-L. Boulesteix, T. Kneib, T. Augustin, and A. Zeileis, “Conditional variable importance for random forests,” BMC bioinformatics, vol. 9, pp. 1–11, 2008.

[212]

E. Štrumbelj and I. Kononenko, “Explaining prediction models and individual predictions with feature contributions,” Knowledge and Information Systems, vol. 41, no. 3, pp. 647–665, Dec. 2014, doi: 10.1007/s10115-013-0679-x.

[213]

S. L. Smith, B. Dherin, D. G. Barrett, and S. De, “On the origin of implicit regularization in stochastic gradient descent,” arXiv preprint arXiv:2101.12176, 2021, doi: 10.48550/arXiv.2101.12176.

[214]

A. Swanson, M. Kosmala, C. Lintott, R. Simpson, A. Smith, and C. Packer, “Snapshot serengeti, high-frequency annotated camera trap images of 40 mammalian species in an african savanna,” Scientific data, vol. 2, no. 1, pp. 1–14, 2015, doi: 10.1038/sdata.2015.26.

[215]

C. Szegedy et al., “Intriguing properties of neural networks,” arXiv preprint arXiv:1312.6199, 2013, doi: 10.48550/arXiv.1312.6199.

[216]

I. Takeuchi, Q. V. Le, T. D. Sears, A. J. Smola, and C. Williams, “Nonparametric quantile estimation.” Journal of machine learning research, vol. 7, no. 7, 2006.

[217]

N. Taleb, “The black swan: Why don’t we learn that we don’t learn,” NY: Random House, vol. 1145, 2005.

[218]

Y.-P. Tang, G.-X. Li, and S.-J. Huang, “ALiPy: Active learning in python,” arXiv preprint arXiv:1901.03802, 2019, doi: 10.48550/arXiv.1901.03802.

[219]

T. Tanay and L. Griffin, “A boundary tilting persepective on the phenomenon of adversarial examples,” arXiv preprint arXiv:1608.07690, 2016, doi: 10.48550/arXiv.1608.07690.

[220]

A. S. Tejani et al., “Checklist for Artificial Intelligence in Medical Imaging (CLAIM): 2024 Update,” Radiology: Artificial Intelligence, vol. 6, no. 4, p. e240300, Jul. 2024, doi: 10.1148/ryai.240300.

[221]

D. Tsipras, S. Santurkar, L. Engstrom, A. Turner, and A. Madry, “Robustness may be at odds with accuracy,” arXiv preprint arXiv:1805.12152, 2018, doi: 10.48550/arXiv.1805.12152.

[222]

J. Q. Toledo-Marı́n, G. Fox, J. P. Sluka, and J. A. Glazier, “Deep learning approaches to surrogates for solving the diffusion equation for mechanistic real-world simulations,” Frontiers in Physiology, vol. 12, p. 667828, 2021, doi: 10.3389/fphys.2021.667828.

[223]

C. Uhler, G. Raskutti, P. Bühlmann, and B. Yu, “Geometry of the faithfulness assumption in causal inference,” The Annals of Statistics, pp. 436–463, 2013.

[224]

M. J. Van der Laan and S. Rose, Targeted learning, vol. 1. Springer, 2011. doi: 10.1007/978-1-4419-9782-1.

[225]

R. Van Noorden and J. M. Perkel, “AI and science: What 1,600 researchers think,” Nature, vol. 621, no. 7980, pp. 672–675, Sep. 2023, doi: 10.1038/d41586-023-02980-0.

[226]

V. N. Vapnik, “An overview of statistical learning theory,” IEEE transactions on neural networks, vol. 10, no. 5, pp. 988–999, 1999, doi: 10.1109/72.788640.

[227]

C. Vens, J. Struyf, L. Schietgat, S. Džeroski, and H. Blockeel, “Decision trees for hierarchical multi-label classification,” Machine learning, vol. 73, pp. 185–214, 2008, doi: 10.1007/s10994-008-5077-3.

[228]

T. Vigen, Spurious correlations. Hachette UK, 2015.

[229]

V. Vovk, G. Shafer, and I. Nouretdinov, “Self-calibrating probability forecasting,” Advances in neural information processing systems, vol. 16, 2003.

[230]

S. Wachter, B. Mittelstadt, and C. Russell, “Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR,” SSRN Electronic Journal, 2017, doi: 10.2139/ssrn.3063289.

[231]

S. Wang et al., “Defensive dropout for hardening deep neural networks under adversarial attacks,” in 2018 IEEE/ACM international conference on computer-aided design (ICCAD), IEEE, 2018, pp. 1–8. doi: 10.1145/3240765.3264699.

[232]

D. S. Watson and M. N. Wright, “Testing conditional independence in supervised learning algorithms,” Machine Learning, vol. 110, no. 8, pp. 2107–2129, Aug. 2021, doi: 10.1007/s10994-021-06030-6.

[233]

D. H. Wolpert, “The lack of a priori distinctions between learning algorithms,” Neural computation, vol. 8, no. 7, pp. 1341–1390, 1996, doi: 10.1162/neco.1996.8.7.1341.

[234]

Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip, “A comprehensive survey on graph neural networks,” IEEE transactions on neural networks and learning systems, vol. 32, no. 1, pp. 4–24, 2020, doi: 10.1109/TNNLS.2020.2978386.

[235]

L. Wynants et al., “Prediction models for diagnosis and prognosis of covid-19: Systematic review and critical appraisal,” BMJ (Clinical research ed.), vol. 369, p. m1328, Apr. 2020, doi: 10.1136/bmj.m1328.

[236]

J. Yang, K. Zhou, Y. Li, and Z. Liu, “Generalized Out-of-Distribution Detection: A Survey,” International Journal of Computer Vision, Jun. 2024, doi: 10.1007/s11263-024-02117-4.

[237]

J. Yoon, J. Jordon, and M. Van Der Schaar, “GANITE: Estimation of individualized treatment effects using generative adversarial nets,” in International conference on learning representations, 2018.

[238]

K. Yu et al., “Causality-based feature selection: Methods and evaluations,” ACM Computing Surveys (CSUR), vol. 53, no. 5, pp. 1–36, 2020, doi: 10.1145/3409382.

[239]

A. Zeileis, T. Hothorn, and K. Hornik, “Model-Based Recursive Partitioning,” Journal of Computational and Graphical Statistics, vol. 17, no. 2, pp. 492–514, Jun. 2008, doi: 10.1198/106186008X319331.

[240]

Z. Zhang, Y. Jin, B. Chen, and P. Brown, “California almond yield prediction at the orchard level with a machine learning approach,” Frontiers in plant science, vol. 10, p. 809, 2019, doi: 10.3389/fpls.2019.00809/full.

[241]

H. Zhang, H. Chen, Z. Song, D. Boning, I. S. Dhillon, and C.-J. Hsieh, “The limitations of adversarial training and the blind-spot attack,” arXiv preprint arXiv:1901.04684, 2019, doi: 10.48550/arXiv.1901.04684.