14 Reporting

If you don’t publish the results of your experiments, can it still be considered science? If you don’t share your results, others can’t verify or build on your work. That’s why publishing is so important in science. When you use machine learning in your research, there are additional needs for reporting. You may publish a paper but also report on the model or dataset you created. To make reporting easier, you can use checklists. Rather than just copying these checklists here – which would make for a terrible read – this chapter points you to the best checklists, their scope, and the ideas behind them.

After some years, other birds became interested in Rattle and Krarah’s tornado prediction model. The pigeons, the magpies, and, of course, the parrots wanted to repeat the Ravens’ success. But they had so many questions, such as what data was used in training? Krarah’s disciples were tired of answering the same questions over and over again. They decided to compile all the information about the model into one document and publish it.

14.1 Checklist for papers

If you use machine learning in your research project, what details should you include in your paper? One checklist you might want to use is REFORMS, which stands for “Reporting standards for machine learning-based science” [1]. The authors developed REFORMS based on literature reviews and a consensus of 19 researchers from different fields. REFORMS also includes guidelines on completing the checklist and identifying potential problems in your modeling approach. REFORMS is field agnostic – this means the checklist items are broadly applicable, but some may be missing depending on your needs.

The REFORMS checklist consists of 8 modules (categories) and a total of 32 items:

Study goals. Example: Motivation for the use of machine learning methods in the study
Computational reproducibility. Example: README file which contains instructions for generating the results using the provided dataset and code
Data quality. Example: Sample size and outcome frequencies
Data preprocessing. Example: Identification of whether any samples are excluded with a rationale for why they are excluded
Modeling. Example: Method for selecting the model(s) reported in the paper
Data leakage. Example: Justification that each feature or input used in the model is legitimate for the task at hand and does not lead to leakage
Metrics and uncertainty. Example: Justification for the choice of statistical tests (if used) and a check for the assumptions of the statistical test
Generalizability and limitations. Example: Evidence of external validity

Further resources:

Some papers, especially in the medical field may require additional attention:

TRIPOD+AI [2] is a publication approach for clinical prediction models. It comes with a checklist.
CLAIM for AI provides a checklist for AI papers about medical imaging [3].

14.2 Model Cards

Besides publishing a paper, you may want to share the model. You could just upload the model somewhere (like Github) and call it a day. The problem: Other researchers and interested parties will have trouble working with your model. A standard README file might be too short to describe the model, and a research paper might be too long and in the wrong format. The solution: Create a special document with your model such as a Model Card. “Model Cards” by Google Research [4] refers to both the approach and the resulting documents. Model Cards is a checklist approach used by many institutions, including HuggingFace a large model hub and community for machine learning.

The original Model Cards paper introduces 10 categories each with multiple details prompts:

Model Details. Example: Model date
Intended Use. Example: Primary intended use
Factors. Example: Evaluation Factors
Metrics. Example: Decision Thresholds
Evaluation Data. Example: Datasets
Training Data
Quantitative Analyses
Ethical Considerations
Caveats and Recommendations

The Model Cards approach is constantly evolving. For example, the Model Cards Guidebook [5] and template (see below) contain new and renamed categories. Model Cards is a great starting point for model documentation, but you should always customize it to your needs.

Further resources:

Again, you may have more specific documentation requirements. For example, medical devices using machine learning have special reporting requirements, see this guideline (focus on Germany).

14.3 Datasheets for Datasets

Carefully built and curated datasets are at the heart of machine learning. A lot of time and effort can go into collecting and cleaning data. Sometimes years. To maximize your return on investment, you might publish the dataset so that others can use it and cite your work. But just like the model, the dataset will raise a lot of questions if it doesn’t come with the right information. When it comes to documentation, it is not wise to reinvent the wheel but to stand on the shoulders of giants. In this case, “Datasheets for Datasets” [6] is a popular approach to “facilitate communication between datasets creators and consumers.”

Datasheets for Datasets lists 57 questions in 7 categories:

Motivation. Example: For what purpose was the dataset created?
Composition. Example: What data does each instance consist of?
Collection Process. Example: How was the data associated with each instance acquired?
Preprocessing/cleaning/labeling. Example: Is the software that was used to preprocess/clean/label the data available?
Uses. Example: Has the dataset been used for any tasks already?
Distribution. Example: How will the dataset be distributed (e.g., tarball on website, API, GitHub)?
Maintenance. Example: How can the owner/curator/manager of the dataset be contacted (e.g., email address)?

Resources:

14.4 Reporting relies on good practice

If you want to document your research project well, you need to follow good research practices that go hand in hand with other topics in this book:

Documenting the research project, model, and dataset forces you to think about use cases and limitations, which is closely related to how well your model generalizes to different contexts (Chapter 7).
Publishing results from your model may require model insights, for which you may need interpretability (Chapter 9).
Sharing models and data is also an integral part of reproducibility (Chapter 13).