<aside>
<img src="/icons/arrow-right_gray.svg" alt="/icons/arrow-right_gray.svg" width="40px" />
Data provider documentation is the main source of truth about a dataset and we always link to it on our dataset pages. We expect this documentation to be rigorous and up to date, covering methods, dataset validation, and the structure of delivered data.
</aside>
Third party source data
- List all sources of third party data used (e.g. Sentinel-2, NEON LiDAR data), including date and version information if appropriate.
- Wherever possible, include citations, links, and/or license information.
- State any known legal, commercial, or ethical restrictions placed on the data.
First party data collection
- Only applicable where first party data are collected to generate or validate the dataset.
- Describe the approach used, such as:
- Locations and sizes of field plots, when they were visited, and by whom.
- Manufacturer, model, and version information of instruments/sensors used.
- Protocols used to collect data.
- Report compliance with industry norms (e.g. established scientific methodologies), standards (e.g. ISO standards), and regulations (e.g. site access, export/import permits).
- Acknowledge volunteers, partners, site/landowners, and governing bodies as required.
Data processing
- State the purpose of first and third party data (e.g. model training data).
- Outline steps taken to filter, transform, or otherwise pre-process first and third party data.
- Summarise the general architecture (e.g. LightGBM regression) of any calculations or statistical or machine learning models applied to the data.
- Where applicable, describe how machine learning models are parameterised, trained, and tuned.
Quality assurance and benchmarking
- We expect all nature datasets to be validated by rigorous QA and benchmarking.
- State how low quality or anomalous data is handled, both for input data (e.g. cloudy Sentinel-2 scenes) and output data (e.g. out of bounds predictions).
- Provide links to external benchmarking reports, such as scientific papers or endorsements from third party certifiers.
- Where internal benchmarking is undertaken, describe the source, locations, and details of ground truth data, approach taken, and graphs and statistical test outputs showing performance (e.g. accuracy, precision, uncertainty).