Project Details XML-based IUPAC Standard for Experimental and Critically Evaluated Thermodynamic Property Data Storage and Capture

Project No.:
2002-055-3-024
Start Date:
01 September 2003
End Date:
14 March 2006
Division Name:
Committee on Publications and Cheminformatics Data Standards
Division No.:
024

Objective

It is intended to create an XML-based dictionary for storage and exchange of thermophysical and thermochemical data based on fundamental principles of phenomenological thermodynamics covering a wide variety of systems such as pure chemical compounds, multicomponent mixtures, and chemical reactions. Upon completion of the project, the developed dictionary and corresponding XML schema could become internationally accepted as a standard for thermodynamic data storage and exchange.

Such a standard is urgently required since thermodynamic data are commonly used in a great variety of the engineering applications as well as in numerous fundamental research projects as an information element for knowledge discovery.

Description

Thermophysical and thermochemical property data represent a key foundation for development and improvement of all chemical process technologies. These data are also critical for support of fundamental research in Physics, Chemistry, Biology, and Material Science. An unprecedented growth of the number of custom-designed software tools for various engineering applications has created an interoperability problem between the formats and the structure of the thermodynamic data files and required input/output structure designed for various application software products. This problem is reflected in the extremely time- and resource-consuming efforts to collect the data within a particular data management environment using numerous data sources of different nature. The development of standards for thermodynamic data storage and exchange is the only principal solution of this problem.

Within the last 20 years this problem has become a major obstacle for development of efficient process design software tools requiring generation of extensive thermophysical and thermochemical property data packages. The major objective of this project is to establish an international standard for thermophysical/thermochemical data storage and exchange to provide a practical solution of this problem.

The development of a standardized XML-based dictionary is the most powerful instrument to provide an interoperability solution for interpretation and use thermodynamic data. This dictionary has to be able to describe the complete set of thermophysical and thermochemical properties (more than 120), their uncertainties and related metadata. XML (Extensible Markup Language) avoids common pitfalls in language design: it is extensible and platform-independent. Since XML files are essentially textual files, they can be easily analyzed without the use of specific customized software products and can be read by a variety of text editors.

The developed XML-based structure will represent a balanced combination of hierarchical and relational elements. It will explicitly incorporate structural elements related to basic principles of phenomenological thermodynamics: thermochemical and thermophysical (equilibrium and transport) properties, state variables, system constraints, phases, and units. Meta- and numerical data records will be grouped into ‘nested blocks’ of information corresponding to data sets. The metadata records will precede numerical data information, providing a robust foundation for generating ‘header’ records for any relational database where XML-formatted files could be incorporated. The structural features of the metadata records will ensure unambiguous interpretation of numerical data as well as data-quality control based on the Gibbs Phase Rule. Implementation of the Gibbs Phase Rule would provide users with an indication of inconsistencies in thermodynamic data before the data are deposited into a data-storage facility. Moreover, some detailed information included in the metadata records could serve as a background for independent assessment of uncertainties, which could be propagated into uncertainties of physical parameters for reaction streams, and consequently, provide an opportunity for quantitative characterization of the quality of a chemical process design.

Commonly accepted IUPAC-based terminology will be used as the foundation for metadata and numerical data tagging. In addition, the self-explanatory approach and very limited use of abbreviations will minimize the time necessary for users to understand the schema and to convert the XML- formatted data with customized software or commercial XML parsers.

The dictionary will be designed to take advantage of the modular nature of XML schemas. In particular, the emphasis will be made to ensure compatibility with the schema currently being developed under the scope of the IUPAC project 2002-022-1-024 “Standard XML data dictionaries for chemistry”. By design, there will be only one unit selected for each property covered by the dictionary. These units will be SI-based, however, for a number of properties the selected units might be multiples of SI units to ease interpretation of numerical values. Unit tagging will be explicitly propagated to every numerical data point as a part of each property name, thus minimizing the possibility of unit misinterpretation.

Various methods of numerical data representation commonly used in publication of experimental property data (e.g., direct, difference from values at a reference state, ratio of the value to that at a reference state, etc.) are planned to be incorporated into the schema.

The developed dictionary will provide elements for storage and exchange of experimental, critically evaluated, and predicted data. The schema will have provisions for the expressions of various measures of the thermodynamic data uncertainties such as standard uncertainty, combined standard uncertainty, combined expanded uncertainty, and different types of the precision (repeatability, deviation from the fitted curve, device specifications). Definitions and descriptions of all quantities related to the expression of uncertainty in the dictionary will conform to the Guide to the Expression of Uncertainty in Measurement, ISO (International Organization for Standardization), October, 1993.

The developed schema will be validated extensively with data records managed by the SOURCE Data System, the largest experimental thermodynamic data storage facility in the world. In addition, validation will include data files corresponding to publications of experimental and critically evaluated data in major journals in the field such as the Journal of Chemical and Engineering Data and the Journal of Chemical Thermodynamics. The necessary arrangements with the Editors and Publishers of the journals have been made. To expedite and automate the process of schema validation, software tools guiding the process of data capture and generation of XML-formatted files will be developed on the basis of the formulated dictionary.

Establishment of the XML-based standard for storage and exchange of the thermodynamic data will provide an easy-to-use and extremely efficient pipeline to transport data from data producers to data users, serving as a hub tool and assuring interoperability between various data management systems and operation platforms.

The project will be conducted in close cooperation with industry (Design Institute for Physical Properties, DIPPR, combining thermophysical data activities for more than 40 major companies worldwide).

Progress

“ThermoML” is reserved namespace for the XML-based IUPAC standard for experimental and critically-evaluated thermodynamic property data storage and capture
> www.iupac.org/namespaces/ThermoML/

On 29 January 2004, the project task group met at the ESDU International plc, London, U.K > See report published in Chem. Int. July-Aug 2004

A manuscript is being prepared for publication in Pure Appl. Chem. A final document was submitted to public review comments until 31 January 2006. A one-day symposium on ThermoML: Purpose, Structure, and Applications will be held on Monday, 27 March 2006 in Atlanta, GA … more

Project completed – IUPAC Recommendations published in Pure Appl. Chem. 78(3), 541-612, 2006
+ Supporting Information (zip file – 38KB)

> update 18 Dec 2006: In case of minor differences between the text describing ThermoML (Pure Appl. Chem., 2006, 78, 541-612) and the ThermoML schema (ThermoML.xsd), the schema should always be considered normative.