Project Details Development of a Standard for FAIR Data Management of Spectroscopic Data

Project No.:
2019-031-1-024
Start Date:
18 March 2020
End Date:

Objective

The objective of this project is to apply FAIR data principles to spectroscopic data in the field of chemistry building on IUPAC’s extensive expertise in this area. The project will develop standards for the production and dissemination of digital data objects that contain enough spectral data and metadata that they can be (a) findable through semantic searches on the web, (b) available through standard interfaces, (c) interoperable and transferable between systems, and (d) readable and reusable over time, for both humans and machines.

Description

As a key data class for characterizing chemical substances, spectroscopic data are increasingly required for reporting. To facilitate accurate dissemination and analysis of these data in the online environment, it is necessary to develop interoperable representations that are readable by both humans and machines. In 2016, guidelines were proposed that establish FAIR principles for research data management ensuring that data are findable, accessible, interoperable, and reusable in the digital environment [1].

This project will consider metadata elements that are critical for the FAIR management of spectroscopic data, including those that are general to all experimental techniques (such as ORCID and InChI) as well as specialized elements for specific fields, such as NMR spectroscopy. A minimum implementation will be proposed based on already established metadata efforts, including those of the Allotrope Data Framework and nmrML to encourage adoption and facilitate widespread use. These elements have been partially reviewed in IUPAC CPCDS sponsored workshops (Amsterdam, 2018 and Orlando, 2019) and in conference calls during early 2019.

The tasks of this project include:

1) development of clear recommendations for metadata that allow the registering of spectroscopic data with a registering agency such as DataCite of CrossRef;

2) specification of a standard format for the metadata that will be associated with the actual data, whatever that data’s actual form (JCAMP-DX files, vendor-specific raw data formats, etc.); and

3) validation criteria to check files for readable and interoperable representation of data and metadata based on the standard; for example, the CheckCIF model used in managing crystallography data.

This project will be carried out in coordination with other efforts in this area, including:

a) IUPAC Project 2016-023-2-300, with their interest in the de facto JCAMP-DX 6.0 specification and possible changes that we might propose to that which would add metadata-related values such as ORCID persistent identifiers and InChI compound identifiers.

b) data publishing pilots involving chemistry journal publishers proposed at the recent FAIR Chemical Data workshop in Orlando, who will be developing workflow methods around our proposed standards.

c) DOI registering agencies such as DataCite and CrossRef, to effectively integrate the metadata we propose into these schema and ensure they are discoverable via these platforms and those that leverage this metadata

d) broad engagement of the stakeholder community in testing the recommendations and other efforts underway internationally.

We expect this work to provide the general basis for other areas of spectroscopy, not just NMR, that future projects can use as a starting point to also create standards for FAIR data management in those areas as well.

[1] https://www.go-fair.org/fair-principles/

Progress

Project announcement published in Chem Int July 2020, p. 28

Relevant background
2019 NSF workshop: “FAIR Publishing Guidelines for Spectral Data and Chemical Structures” (materials, report)

Update August 2021 – See report presented at the virtual GA (9 Aug 2021)
Demonstration using IUPAC FAIRSpec Finding Aids created by a test IFSExtractor on our GitHub site. This is only a very minimal test involving 13 supporting information data sets from the ACS FAIRData pilot.

Update Dec 2021 – We report the (submitted for) publication of a list of guiding principles that form the basis of our work. Along with those principles, we have created a set of working definitions of terms relevant to the project. (Links in blue are bookmarks into this document.) We have defined the scope of the project and have worked out a data/metadata model for what we are calling an “IUPAC FAIRSpec Finding Aid” that will be associated with an “IUPAC FAIRSpec Data Collection.”

A valuable exercise we carried out during the summer of 2021 was the analysis of 13 datasets provided by the ACS FAIR Data pilot involving authors of articles in J. Org. Chem. and Organic Letters. This work was instrumental in our development of our models and understanding of the task overall. In connection with that analysis, we created a working “data and metadata extraction” utility that successfully extracted multiple representations of spectra and structures from these datasets, creating a prototypical IUPAC FAIRSpec Data Collection with associated Finding Aid.

We are currently working on fleshing out the details of the data and metadata models, which will be presented at the spring 2022 ACS National Meeting. In addition, we have identified a number of stakeholders and have been contacting them to introduce our ideas and (hopefully) be able to align our recommendations with their emerging practices.

The full progress report is available at our GitHub site, https://github.com/IUPAC/IUPAC-FAIRSpec/tree/main/documents/reports

Page last updated 16 Dec 2021

Partners