Integrity & Accuracy - Guiding Principles of Responsible Chemistry

An icon with a 3-point scatter plot and varying-sized arrow bars.

Use and interpret data, models, and theories with integrity, completeness, and accuracy, and make use of the latest technological innovations ethically, responsibly, and fairly. 

Overview

In the chemical sciences, data, standards, nomenclature, terminology, and symbols should be original, authentic, accessible, and accurate; models and theories should not distort findings; and the source of data and information should be easy to trace by including clear and accurate metadata (i.e. a description of what the data is, where it comes from, and how it is organised). The challenges raised by the growth in the number of large datasets and the need for automated handling of them make data accuracy and integrity increasingly important.

Integrity and accuracy are critical to all aspects of the chemical sciences, from conceiving an idea, planning the study, collecting and analysing data, and reporting and disseminating the findings, to archiving the data and making it available and useful to others. Documenting these workflows and keeping track of where the results come from are essential for ensuring the integrity and accuracy of data, especially when it is shared or reused.

Examples

The FAIR Principles

The interdisciplinary and transdisciplinary research that will be needed to address major global challenges will be critically dependent on scientific datasets, often large ones, that can be widely accessible and reused in ways that may not have been recognised at the time that the data was collected. Because generating such large datasets is costly in terms of time and resources, chemistry researchers should design studies in ways that maximize the potential for data to be reused. The FAIR Principles¹ call for data to be Findable, Accessible, Interoperable, and Reusable for machines as well as humans. These principles were developed to improve the management, sharing, and reuse of scientific data, both within and between disciplines, and across systems and platforms. The FAIR Principles are designed to guide researchers in the creation of data that is compatible with digital tools and systems. Automation is essential when dealing with large datasets and well documented data that are “FAIR” can be more readily validated to ensure integrity.

The criterion of Data Findability is met when data sets having unique and persistent (unchanging) identifiers that are available for searching and which provide clear information on where the data can be accessed.

Data Accessibility relies on metadata that specify protocols for automated access (including appropriate security and authorisation procedures—see the TRUST Principles and CARE Principles in the Open Data example below).

Data Interoperability and Reusability criteria are met when data can be fully integrated with other data, including from other disciplines in order to gain deeper insights than may be apparent from looking at the datasets in isolation. Metadata that completely defines the format, origin, and structure of the data is particularly important when data is being used or applied in the context of other disciplines.

Open Data

Many government-funded research projects require that research findings, including data and interpretations, be made as open as possible, while protecting sensitive information (e.g., personal health information or when Indigenous data sovereignty may be relevant—see the CARE Principles below). Data from such studies, along with appropriate and complete metadata, should be archived or stored in an appropriate data facility and managed in a way that follows the FAIR Principles, as well as the TRUST and CARE Principles, which are outlined below:

TRUST Principles²

Transparency: disclose specific repository services and data holdings that are verifiable by publicly accessible evidence
Responsibility: ensure the authenticity, integrity, security, availability, and reliability of stored data
User Focus: uphold the data management norms and expectations of target-user communities
Sustainability: maintain access to data holdings for the long term
Technology: provide infrastructure and capabilities to support secure, persistent, and reliable services

CARE Principles³

Collective Benefit: Data ecosystems shall be designed and function in ways that enable Indigenous Peoples to derive benefit from the data.
Authority to Control: Indigenous Peoples’ rights and interests in Indigenous data must be recognised and their authority to control such data be empowered.
Responsibility: Those working with Indigenous data have a responsibility to share how that data is used to support Indigenous Peoples’ self-determination and collective benefit.
Ethics: Indigenous Peoples’ rights and well-being should be the primary concern at all stages of the data lifecycle and across the data ecosystem.

Data Standards

IUPAC has a long history of establishing data standards in chemistry. IUPAC is recognised for defining chemical symbols, standardizing technical terminology, developing systematic chemical nomenclature, and critically evaluating chemical property data, including maintaining the official names, atomic numbers, and weights of elements in the periodic table. IUPAC’s “Colour Books” (Books – IUPAC | International Union of Pure and Applied Chemistry)⁴ provide detail on many of these topics and professional chemists make use of these standards and conventions throughout their work. Failure to do so can lead to ambiguity and misunderstanding about the identities and properties of chemicals, which can have serious consequences.

Advances in digital technologies present significant additional challenges for maintaining data standards in chemistry because the automated collection, storage, interpretation, and analysis, and reporting of data depend strongly on the rigour and comprehensiveness of such standards, especially when aiming to adhere to the FAIR Principles.

IUPAC has been collaborating with the Committee on Data of the International Science Council (CODATA), the Research Data Alliance, and numerous other organisations, to help facilitate the use of IUPAC standards in digital and automated applications. This is just one part of a major initiative called the The WorldFAIR Project (worldfair-project.eu), which will “produce recommendations, interoperability frameworks, and guidelines for FAIR data assessment.”  The IUPAC International Chemical Identifier (InChI) standard provides another great model for how to use FAIR methods to verify the digital representations of chemicals. IUPAC is revising the Gold Book Compendium of Chemical Terminology to bring authoritative IUPAC concepts into the set of rules used to define, organise, and describe chemical data.

Research Data Management Planning

FAIR data can be regarded as the outcome of the proper curation of data, while the TRUST and CARE Principles provide some of the means to get there. Proper research data management plans provide an excellent framework through which to implement the FAIR, CARE, and TRUST Principles and establish robust data governance to back up the integrity and accuracy of the work. Data management plans should provide guidance for data:

creation, capture, and/or collection
documentation, organisation, and storage
data processing, usage, and analysis
sharing, including whether data should be open or restricted
maintenance and terms for reuse
archiving or destruction, especially for sensitive or private information

Guiding Future Action

To ensure the integrity and accuracy of scientific research in chemistry and chemistry-adjacent areas, all chemists—regardless of their level of experience—should consider doing the following:

develop a thorough knowledge of current nomenclature and terminology conventions in chemistry, along with the ways that chemical compounds and their data are represented and transmitted
help establish digital data standards in chemistry in conjunction with the greater community
commit to testing and demonstrating the reproducibility of their chemical data and its interpretation
help address the challenges of automated processing and analysis of large data sets
contribute to the development of standards and methods for the validation of chemical data
learn how chemical compounds and their data can be made more FAIR
understand how the CARE and TRUST Principles could be applied to their research

Research data management plans should be living documents that are revisited throughout the lifecycle of a research project. These plans will be dependent on the nature and types of data repositories that are available or chosen, and their security characteristics. The CoreTrustSeal (About – CoreTrustSeal), a certification that a repository meets international standards for being trustworthy, may be useful to researchers and organisations when considering these issues.

In chemistry, the accurate and complete documentation of measurements and results is critical for determining whether research findings can be used for further analysis and modelling. Users of chemistry information need to determine if the variance observed in reported results is of potential scientific interest or an artefact of measurement uncertainty. Multiple sources of random or systematic errors can impact measurement results. The extent to which sources of error can be identified and quantified can provide a level of confidence that is required for many further applications. Formal and robust methods for critical evaluation of reported data are developed and applied by many expert organisations, including IUPAC in conjunction with BIPM (Bureau International des Poids et Mesures—International Bureau of Weights and Measures) and others, to provide the global community with high-quality property values for practical use.⁵

Questions to Guide Discussion

How and when is the detail of formal chemical nomenclature best acquired? What are the new challenges that arise during digital manipulation and transmission of chemical structure and identity?
How do the conventions cope with automated manipulations of large data sets containing chemical structures and other data (and their units)? How might your representations be misunderstood?
Which standards are currently missing and how are they connected to existing ones?
What does reproducibility actually mean? How should it be reported? Best results, average, or typical results?
How do computers and software programs “know” how to interpret data? How much do we rely on context when interpreting a set of data? What does context mean for a computer?
How can you tell if a data set is faulty or corrupted in some way? Will a computer be able to do that? What are the implications for artificial intelligence of having such data sets?
How could you envisage your data being used by other chemists? What about nonchemists, scientists from other fields, or even nonscientists?
How can we determine the need for sensitive data to remain closed? Who would make that distinction? How does data restriction factor into transparency?
How will the CARE and TRUST Principles impact you personally? Do they apply similarly in all countries/regions?

References

Wilkinson, M. D.; Dumontier, M.; Aalbersberg, I. J.; Appleton, G.; Axton, M.; Baak, A.; Blomberg, N.; Boiten, J.-W.; Bonino da Silva Santos, L.; Bourne, P. E.; Bouwman, J.; Brookes, A. J.; Clark, T.; Crosas, M.; Dillo, I.; Dumon, O.; Edmunds, S.; Evelo, C. T.; Finkers, R.; Gonzalez-Beltran, A.; Gray, A. J. G.; Groth, P.; Goble, C.; Grethe, J. S.; Heringa, J.; ’t Hoen, P. A. C.; Hooft, R.; Kuhn, T.; Kok, R.; Kok, J.; Lusher, S. J.; Martone, M. E.; Mons, A.; Packer, A. L.; Persson, B.; Rocca-Serra, P.; Roos, M.; van Schaik, R.; Sansone, S.-A.; Schultes, E.; Sengstag, T.; Slater, T.; Strawn, G.; Swertz, M. A.; Thompson, M.; van der Lei, J.; van Mulligen, E.; Velterop, J.; Waagmeester, A.; Wittenburg, P.; Wolstencroft, K.; Zhao, J.; Mons, B. The FAIR Guiding Principles for Scientific Data Management and Stewardship. Sci. Data 2016, 3, 160018. https://doi.org/10.1038/sdata.2016.18.
Lin, D.; Crabtree, J.; Dillo, I.; Downs, R. R.; Edmunds, R.; Giaretta, D.; De Giusti, M.; L’Hours, H.; Hugo, W.; Jenkyns, R.; Khodiyar, V.; Martone, M. E.; Mokrane, M.; Navale, V.; Petters, J.; Sierman, B.; Sokolova, D. V.; Stockhause, M.; Westbrook, J. The TRUST Principles for Digital Repositories. Sci. Data 2020, 7 (1), 144. https://doi.org/10.1038/s41597-020-0486-7.
Carroll, S. R.; Garba, I.; Figueroa-Rodríguez, O. L.; Holbrook, J.; Lovett, R.; Materechera, S.; Parsons, M.; Raseroka, K.; Rodriguez-Lonebear, D.; Rowe, R.; Sara, R.; Walker, J. D.; Anderson, J.; Hudson, M. The CARE Principles for Indigenous Data Governance. Data Sci. J. 2020, 19 (1). https://doi.org/10.5334/dsj-2020-043.
Cohen, E. R.; Cvitas, T.; Frey, J. G.; Holmstrom, B.; Kuchitsu, K.; Marquardt, R.; Mills, I.; Pavese, F.; Quack, M.; Stohner, J.; Strauss, H.; Takami, M.; Thor, A. J. Quantities, Units, and Symbols in Physical Chemistry, 3rd ed.; IUPAC Green Book; RSC Publishing: Cambridge, UK, 2007. ISBN 978-0-85404-433-7. https://doi.org/10.1039/9781847558039.
Connelly, N. G.; Damhus, T.; Hartshorn, R. M.; Hutton, A. T. Nomenclature of Inorganic Chemistry: IUPAC Recommendations 2005; IUPAC Red Book; RSC Publishing: Cambridge, UK, 2005. ISBN 0-85404-438-8.
Favre, H. A.; Powell, W. H. Nomenclature of Organic Chemistry: IUPAC Recommendations and Preferred Names 2013; IUPAC Blue Book; RSC Publishing: Cambridge, UK, 2014. ISBN 978-0-85404-182-4. https://doi.org/10.1039/9781849733069.
Jones, R. G.; Wilks, E. S.; Metanomski, W. V.; Kahovec, J.; Hess, M.; Stepto, R.; Kitayama, T. Compendium of Polymer Terminology and Nomenclature: IUPAC Recommendations 2008, 2nd ed.; IUPAC Purple Book; RSC Publishing: Cambridge, UK, 2009. ISBN 978-0-85404-491-7. https://doi.org/10.1039/9781847559425.
Hibbert, D. B. Compendium of Terminology in Analytical Chemistry; IUPAC Orange Book; The Royal Society of Chemistry: Cambridge, UK, 2023. ISBN 978-1-78262-947-4. https://doi.org/10.1039/9781788012881.
Férard, G.; Dybkaer, R.; Fuentes-Arderiu, X. Compendium of Terminology and Nomenclature of Properties in Clinical Laboratory Sciences: Recommendations 2016, 2nd ed.; IFCC/IUPAC Silver Book; RSC Publishing: Cambridge, UK, 2017. https://doi.org/10.1039/9781782622451.
McNaught, A. D.; Wilkinson, A. IUPAC Compendium of Chemical Terminology: Gold Book, 2nd ed.; IUPAC: Research Triangle Park, NC, 1997. https://doi.org/10.1351/goldbook.
Shaw, D. G.; Bruno, I. J.; Chalk, S. J.; Hefter, G. T.; Hibbert, D. B.; Hutchinson, R. A.; Magalhães, M. C. F.; Magee, J. W.; McEwen, L. R.; Rumble, J. R.; Russell, G. T.; Waghorne, W. E.; Walczyk, T.; Wallington, T. J. Chemical Data Evaluation: General Considerations and Approaches for IUPAC Projects and the Chemistry Community (IUPAC Technical Report). Pure Appl. Chem. 2023, 95 (10), 1107–1120. https://doi.org/10.1515/pac-2022-0802