The Process Industries and the ISO 15926 Semantic Web

Abstract

Introduction

A goal of the Process Industries is to integrate the lifecycle information of its facilities. This entails setting up, in parallel with the Supply Chain, an Information Chain.

The often-quoted study by NIST - the US National Institute of Standards and Technology - named "Cost Analysis of Inadequate Interoperability in the U.S. Capital Facilities Industry" states, quoting from the Abstract of this report:

"........This study includes design, engineering, facilities management and business processes software systems, and redundant paper records management across all facility life-cycle phases. Based on interviews and survey responses, $15.8 billion in annual interoperability costs were quantified for the capital facilities industry in 2002. Of these costs, two-thirds are borne by owners and operators, which incur most of these costs during ongoing facility operation and maintenance (O&M). In addition to the costs quantified, respondents indicated that there are additional significant inefficiency and lost opportunity costs associated with interoperability problems that were beyond the scope of our analysis. Thus, the $15.8 billion cost estimate developed in this study is likely to be a conservative figure."

A significant barrier to such an Information Chain is the lack of uniformly structured data across all lifecycle domains of a facility.

ISO 15926, started in the late eighties, has been designed with integration of facility lifecycle information in mind.

The Semantic Web is an extension of the current Web (a "Page Web"), that serves as a "Data Web". It is based on common formats that support aggregation and integration of data drawn from diverse sources.

Parts 7, 8 and 9 of ISO 15926 have been designed to amalgamate the rigor of the ISO 15926-2 data model and the richness of the reference data of ISO 15926-4 with the power of the Semantic Web.

Industry consortia like FIATECH and PoscCaesar, and the iRingUserGroup work on building the required tools for this combination. These tools, in their present state, are available in the public domain, and the software industry is invited to use their principles to build their ISO 15926-compliant software.

Results

We present a scenario that shows the value of the information environment the Semantic Web can support the software as used in all phases of the lifecycle of a process plant.

Conclusion

Semantic Web technologies present both promise and challenges. Current tools and standards are already adequate to implement components of the vision. On the other hand, these technologies are young. Gaps in standards and implementations still exist and adoption is limited by typical problems with early technology, such as the need for a critical mass of practitioners and installed base, and growing pains as the technology is scaled up. Still, the potential of interoperable information sources, at the scale of the World Wide Web, merits continued work. This will require proper funding, but the alternative of maintaining the status quo is more costly.

Background

Cradle-to-grave information integration and the information ecosystem

The world today is a "global village", in which business is highly integrated.

Information exchange is the "blood stream" of such integrated business, and one might expect that such exchange is by now without any problem.

This, however, is not the case. This has many reasons, such as:

  • There are approx. 5000 natural languages used in this world (not including the  dialects);
  • Each discipline has its own jargon, that again is in different natural languages, no cross-connections possible;
  • Information is stored in a proprietary format, and must be exchanged with a system with  another proprietary format;
  • Systems (and people) work with 'implicit' information (details are left out, because 'you know');
  • Present standards, such as for e-business, are too limited in scope;
  • Standards change every 3 to 4 years, assets (like a plant or an oil field) have a lifetime of 10 - 100 years.

The Information Chain

The majority of the technical information about a process plant (or any other facility, for that matter) originates from a large number of equipment manufacturers  and from the EPC contractors (EPC = Engineering, Procurement, Construction).

Many projects in the process industries are large to very large in size, and require joint venturing. Also, many EPC contractors outsource work to countries that match a good education system with (still) moderate wages. This results in situations where much information must be shared and exchanged between organizations.

Much of the ability of these engineering, procurement, and construction specialists to work together – exchanging ideas, information, and knowledge across organizational boundaries – is mediated by the Internet and its ever-increasing digital resources. Despite the revolution of the Web, the structure of this information, as evidenced by a large number of heterogeneous data formats, continues to reflect a high degree of idiosyncratic domain specialization, lack of schematization, and schema mismatch.

The lack of uniformly structured data affects the costs involved in these EPC activities, all of which rely heavily on integrating and interpreting data sets produced by different organizations at different levels of granularity. This data has been provided in numerous disconnected databases – sometimes referred to as data silos. It has become increasingly difficult to even discover these databases, let alone characterize them.

These observations lead us to a variety of desiderata for the information environment that can support our Information Chain. It should take advantage of the Web's ability to enable access to vast amounts of information. Queries need to be made across data regardless of the community in which it originates, whilst enforcing adequate security to prevent information theft.

Once a facility has been engineered and constructed the daily life of Operations & Maintenance starts for the Owner/Operator of that facility. The integration of the data resulting from these activities is of utmost importance, because it provides reliable and explicit information from which knowledge can be discovered.

There are many players in the process industries, and a high degree of "promiscuity". No one party can dictate any data format, and for that reason internationally accepted standardization is necessary.

The Semantic Web

The Semantic Web is an extension of the current Web that enables navigation and meaningful use of digital resources by automated processes. It is based on common formats that support aggregation and integration of data drawn from diverse sources.

Currently, links on Web pages are uncharacterized. There is no explicit information that can be handled by a computer intelligently. By contrast, on the Semantic Web, any relationship between two things would be captured in a statement that identifies those two entities and the type of the relationship between them. Such statements are called "triples" because they consist of three parts – subject, predicate, and object.

We might say, for example, that the subject is pump P-101, the predicate (or relationship) rdf:type, and the object the class RDS416834 that stands for "centrifugal pump". Just as the subject and object are identified by the globally unique Uniform Resource Identifiers (URIs) such as

http://www.xyz-corp.com/lifecycledata#C4bc85c80-79c0-11e2-b92a-0800200c9a66 for the pump and http://posccaesar.org/rdl/page/RDS416834  for the pump class, so too is the typing relationship, the full name of which is, in this case, http://www.w3.org/1999/02/22-rdf-syntax-ns#type. A Web browser viewing that triple might show the human readable definition of the relationship.

Since URIs can be used to describe names, all information accessible on the Web today can be part of statements in the Semantic Web. If two statements refer to identical URIs, this means that their subjects of discourse are identical. This makes it possible to merge data references. This process is the basis of information integration on the Semantic Web.

With this as a foundation, a number of existing approaches for organizing knowledge are being adapted for use on the Semantic Web. Among these are thesauri, ontologies, rule systems, etc. Together, the uniform naming of elements of discourse by URIs, the shared standards and technologies around these methods of organization, and the growing set of shared practices in using those, are known as Semantic Web technologies.

The formal definition of relations among Web resources is at the basis of the Semantic Web. Resource Description Framework (RDF) is one of the fundamental building blocks of the Semantic Web, and gives a formal specification for the syntax and semantics of statements (triples). Beyond RDF, a number of additional building blocks are necessary to achieve the Semantic Web vision:

  • Languages to define the controlled vocabularies and ontologies that aid validation and interoperability: the RDF Schema (RDFS) and the Web Ontology Language (OWL);
  • A query language, SPARQL, by which one can retrieve answers from a body of statements.

Specifications of some of these technologies have been published and are stable, while others are still under development. OWL, RDF(S) and SPARQL became a W3C Recommendation in 2004, a long time ago on the Web scale, but not such a long time for the development of good tools and general acceptance by the technical community.

Despite the youth of these technologies, active developer and scientific communities have developed around these technologies. Today, there are a large number of tools, programming environments, specialized databases, etc. These tools are offered both by the open source community and as products offered by small businesses and large corporations. Today, we are at the point at which anybody can start developing applications for the Semantic Web because the necessary development tools are now at our disposal.

How can the Semantic Web help the Process Industries?

We have come to believe the judicious application of Semantic Web technologies can lead to faster and cheaper project execution. The Semantic Web approach offers an expanding mix of standards and technologies layered on top of the most successful information dissemination and sharing apparatus in existence – the World Wide Web. Some of the elements of the technology most relevant to the Process Industries include:

  • The global scope of identifiers that follow from the use of URIs offer a path out of the complexities caused by the proliferation of local identifiers.
  • The Semantic Web schema languages, RDFS and OWL, offer the potential to simplify the management and comprehension of a complicated and rapidly evolving set of relationships that we need to record among the data describing products. Along with the benefits of the technologies that underlie our current data stores, there are a number of significant disadvantages that the Web schema languages remediate.
  • RDFS and OWL are self-descriptive. Engineers that integrate different types of data need to understand both what the data means at the domain level, as well as the details of its form as described in associated data schemas. Because these schemas tend to be technology- and vendor-specific, it is a significant burden to understand and work with them. While the need to integrate more types of data will continue, RDFS and OWL offer some relief to the burden of understanding data schemas. On the Semantic Web, classes, instances, and relationships are represented in the same way.
  • RDFS and OWL are flexible, extendable, and decentralized because they are designed for use in the dynamic, global environment of the Web. RDFS and OWL support hierarchical relationships at their core, allowing for easy incorporation of subclass and subproperty relationships that are essential for managing and integrating complex data. New schemas can easily incorporate previously defined classes and properties that refer to data elsewhere on the Web.
  • The ability to easily extend the work of others makes worthwhile the development of ontologies that can be shared across different domains. Data from projects that build upon them will be easier to link together than those that use ad-hoc solutions or choose from a variety of disparate and proprietary systems.
  • Reasoners for the Semantic Web schema languages introduce capabilities previously not widely available by offering the ability to do inference, classification, and consistency checking. Each of these capabilities has benefits. For example, the powerful consistency checking offered by OWL reasoners can help ensure that schemas, ontologies, and data sets do not contain contradictory or malformed statements. These erroneous statements are unfortunately quite common.

Where all the above is true, there are drawbacks that need attention. The most important drawback is that the languages are, as stated above, "flexible, extendable, and decentralized", and hence wide open for user-defined classes and properties. Given the fact that technical people always know better this would lead to the next level of a babel of tongues. The use of the ISO 15926 standard can help avoiding that situation.

In the remainder of this paper we will describe the role of ISO 15926.

ISO 15926 -

Integration of lifecycle data for process plants including oil and gas production facilities

ISO 15926 has six main parts:

  • Part 2 - Data model - a fully generic, data-driven, 4D model with 201 entity types
  • Part 3 - Reference data for geometry and topology
  • Part 4 - Initial reference data - core classes, object models, reference individuals (e.g. cities) - now 15,000 classes, expected to grow to 100,000
  • Parts 7, 8 and 9 - Implementation methods (in OWL) - using standardized templates, being n-ary relations

The properties (attributes) of the templates refer to Part 4 reference data in a standard RDF/OWL manner. The semantics of any template type, so its "internals", are modelled in terms of Part 2 entity types.

The purpose of ISO 15926 is to provide a Lingua Franca for computer systems, thereby integrating the information produced by them. Although set up for the process industries with large projects involving many parties, and involving plant operations and maintenance lasting decades, the technology can be used by anyone willing to set up a proper vocabulary of reference data linked with Part 4.

Data are mapped at the source. Each computer program maps its data from its internal format to a standard format defined by ISO 15926-8. Those data then are stored in a System Façade, each system has its own System Façade. A Façade is an RDF triple store, set up to a standard schema and API, as defined in Part 9. Any Façade only stores the data for which the Façade owner is responsible.

Data can be queried by means of SPARQL, and also can be "handed over" from one Façade to another in cases where data custodianship is handed over (e.g. from a contractor to a plant owner, of from a manufacturer to the owners of the manufactured goods). Façades have a standard API for population, handing over, information exchange, and querying.

Queried information can be presented on screen and paper, thereby using user-defined Document Types. A "document" is a view on the data that is required for a certain activity. Any user organization can build these in accordance with their standards. Since they are based on specializations of standard templates they can be handled in a standard way.

Since the data model is a 4D (space-time) model, it is possible to present the data that was valid at any given point in time, thus providing a true historical record at the finest granularity. It is expected that this will be used for Knowledge Mining.

One can set up Façades for the consolidation of data by handing over data produced by various systems and stored in their System Façades. Examples are: a Façade for a project discipline, a project, a plant, or even for a company in a fiscal year). This set-up follows the requirements of the users.

In any implementation a number of Façades can be involved, with different rights. This is done by means of setting up a server called a CPF server (= Confederation of Participating Façades). Using SOAP and WSDL an Ontology Browser can have access to one or more Façades in a given CPF, depending on the access rights.

Integration in the Semantic Web

The data model of Part 2 has been mapped to an OWL-based taxonomy, and any object that is being declared must be typed with the applicable Part 2 entity type.

The reference data of Part 4 have been mapped to an OWL-based taxonomy as well, as instances and subclasses of Part 2 entity types. For Part 4 reference classes so-called Object Information Models are set up by domain experts. These ontologies are fully OWL-based, and used for validation of RDF instances.

The RDF instances are validated against the applicable OWL schemas of Part 2 and Part 4.

All above data are expressed in the RDF/XML format for exchanges, and in the triple format for storage in RDF stores.

Data Integration

There is a tacit assumption within the Semantic Web community that every data set and ontology will interoperate. The reality is that different conceptualizations and representations of the same data can exist. While the architecture and basic tools of the Semantic Web remove a set of previous roadblocks to data integration, positive progress towards it requires study, experimentation, and at-scale efforts that exercise proposed solutions.

To date, we have primarily focused on building prototypes that have functioned independently.

In order to integrate data sets, one of two things must happen: either terms for entities and relationships must be shared between the data sets (the data sets must be built using a shared ontology) or concordances must be available that relate terms in one data set to those in another.

Building ontologies is hard. The efforts are therefore focusing on identifying available knowledge resources (e.g., thesauri, terminologies, ontologies) that cover the basic entities and relations required to formally represent well-defined scenarios. While concepts in areas of engineering may be incomplete, unclear, in transition or under dispute, there are many important entities and relations upon which most engineers will agree. These core classes can be found in the taxonomy of ISO 15926-4.

Much work will have to be done on complete ontologies that apply for these classes, i.e. what information can possibly be of interest during the lifetime of a member of any of such classes. These ontologies are dubbed "Object Information Models" (OIMs). These are built from sets of ISO 15926 templates, that is: of specializations of these core templates.

Current technical limitations of semantic web

Semantic Web technologies are young. Gaps in standards and implementations still exist and adoption is limited by typical problems with early technology, such as the need for a critical mass of practitioners and installed base, and growing pains as the technology is scaled up. Some issues that have affected the work of the ISO 15926 community are:

  • Scarcity of semantically annotated information sources - Most common sources of data for the process industries are not currently in a RDF or OWL;
  • Performance and scalability - RDF and OWL stores are slower than optimized relational databases, but are improving steadily;

Cross-community interactions

There is an emerging consensus in the process industries community at large for the need to formalize and share data annotation semantics. This is championed by such consortia as FIATECH, PoscCaesar, and USPI.

The Process Industry communities need to further coordinate efforts in areas critical to lifecycle information integration, namely:

  • Formalizing the semantics of the elements of process plants
  • Engaging "early adopters" of Semantic Web technologies, and as a resource for driving use cases.
  • Working with other ontology suppliers to find appropriate ways to translate their extensive vocabularies and knowledge resources into RDF for effective use on the Semantic Web.

Tensions have occurred between the Semantic Web communities and other communities like the XML and database communities, as some people believe that the technologies being advocated by these communities cannot coexist with each other. One way to ease such tensions is for the Semantic Web community to develop a complementary rather than competitive relationship with these communities. The Semantic Web should be perceived as a complement instead of a replacement to existing technologies. For example, RDF/OWL can be serialized as XML, and can be used to provide a richer semantic layer for use with other XML technologies. The developers of triple stores and RDF query languages have been greatly inspired by the theoretical and practical work done by the database community. Providers of valuable knowledge would be more willing to make their data accessible to the Semantic Web community if they did not need to abandon their own formats. For example, converters can be provided for translation. At the same time, additional tools can be developed to exploit the new features (e.g., reasoning).

Education

The vision of a Semantic Web streamlining the Information Chain in the Process Industries making data available in a standard format. Often the effort that goes into preparing and serving this data will not directly benefit the provider. Instead, equipment suppliers are measured for supplying their equipment at competitive prices and quality, the engineers for producing their designs efficiently, etc.

Even if the process industries community decided today to publish data in the proper format, we do not yet have adequate numbers of skilled data modellers. Data modelling is a hard-learned skill, and the challenge is substantially magnified when the intention is to share information in a standard format. We need to establish and populate a new discipline, a mix of interdisciplinary skills that include solid understanding of engineering, computer science, philosophy and the social anthropology of the globalized economy.

Security

There are security risks involved with sharing information. This is being addressed in the forthcoming Part 9.

Conclusion

We have discussed the potential of the Semantic Web to facilitate setting up Information Chains. Although Semantic Web technologies are still evolving, there are already existing standards, technologies, and tools that can be practically applied to a wide range of process industry use cases.

There are challenges to the widespread adoption of the Semantic Web in the process industries. Some parts of the technology are still in development and are untested at large scales. Informaticians need training and support to be able to understand and work with these new technologies. Incentives of proven cost reduction and efficiency improvements need to be provided to encourage joining the ISO 15926 Semantic Web.

Projects

A flurry of cooperating implementation projects are active:


Acknowledgements

Applicable parts of the paper "Advancing translational research with the Semantic Web", and of the Introduction section of www.InfowebML.ws have been re-used and adapted for this paper.