The objective of this project is to apply FAIR data principles to spectroscopic data in the field of chemistry building on IUPAC’s extensive expertise in this area. The project will develop standards for the production and dissemination of digital data objects that contain enough spectral data and metadata that they can be (a) findable through semantic searches on the web, (b) available through standard interfaces, (c) interoperable and transferable between systems, and (d) readable and reusable over time, for both humans and machines.
As a key data class for characterizing chemical substances, spectroscopic data are increasingly required for reporting. To facilitate accurate dissemination and analysis of these data in the online environment, it is necessary to develop interoperable representations that are readable by both humans and machines. In 2016, guidelines were proposed that establish FAIR principles for research data management ensuring that data are findable, accessible, interoperable, and reusable in the digital environment .
This project will consider metadata elements that are critical for the FAIR management of spectroscopic data, including those that are general to all experimental techniques (such as ORCID and InChI) as well as specialized elements for specific fields, such as NMR spectroscopy. A minimum implementation will be proposed based on already established metadata efforts, including those of the Allotrope Data Framework and nmrML to encourage adoption and facilitate widespread use. These elements have been partially reviewed in IUPAC CPCDS sponsored workshops (Amsterdam, 2018 and Orlando, 2019) and in conference calls during early 2019.
The tasks of this project include:
1) development of clear recommendations for metadata that allow the registering of spectroscopic data with a registering agency such as DataCite of CrossRef;
2) specification of a standard format for the metadata that will be associated with the actual data, whatever that data’s actual form (JCAMP-DX files, vendor-specific raw data formats, etc.); and
3) validation criteria to check files for readable and interoperable representation of data and metadata based on the standard; for example, the CheckCIF model used in managing crystallography data.
This project will be carried out in coordination with other efforts in this area, including:
a) IUPAC Project 2016-023-2-300, with their interest in the de facto JCAMP-DX 6.0 specification and possible changes that we might propose to that which would add metadata-related values such as ORCID persistent identifiers and InChI compound identifiers.
b) data publishing pilots involving chemistry journal publishers proposed at the recent FAIR Chemical Data workshop in Orlando, who will be developing workflow methods around our proposed standards.
c) DOI registering agencies such as DataCite and CrossRef, to effectively integrate the metadata we propose into these schema and ensure they are discoverable via these platforms and those that leverage this metadata
d) broad engagement of the stakeholder community in testing the recommendations and other efforts underway internationally.
We expect this work to provide the general basis for other areas of spectroscopy, not just NMR, that future projects can use as a starting point to also create standards for FAIR data management in those areas as well.
Project announcement published in Chem Int July 2020, p. 28
Page last updated 16 July 2020