DOI Handbook - Data Model

This chapter explains the basis for the second main technical component of the DOI^® system, the DOI^® data model, and its ability to ensure interoperability of DOI name metadata assigned through existing metadata schemes. The chapter gives an overview of the system, and then separate sections discuss the aims of the DOI data model policy — interoperability and good administration — and the three tools of the Metadata System — kernel metadata, the data dictionary and schemas for metadata interchange. Readers are advised to consult the Glossary of Terms in conjunction with this chapter.

4.1 Overview of the DOI^® data model
4.2 Aims of DOI data model policy
      4.2.1 Interoperability
      4.2.2 Administrative capability
4.3 DOI metadata
      4.3.1 DOI^® metadata Kernel
      4.3.2 Use of the DOI Kernel
      4.3.3 Procedure for management of DOI metadata schemas
            4.3.3.1 Adding values to the Kernel controlled vocabularies
            4.3.3.2 Update procedure
      4.3.4 DOI Data Dictionary
      4.3.5 Vocabulary Mapping Framework
      4.3.6 Metadata integration:goals of the IDF
4.4 Metadata requirements, ISO 26324
4.5 Underlying ontology

4.1 Overview of the DOI^® data model

Without metadata, an identifier is of very little value. Metadata, which may be defined in this context as information about an identified Referent, provides human beings or machines with the data they need to enable them to make use of that identified Referent. Metadata may include names, identifiers, descriptions, types, classifications, locations, times, measurements, relationships and any other kind of information related to a Referent.

There are two ways in which every IDF Registration Agency is bound to deal with metadata. An RA will gather input metadata from Referent providers (typically, descriptions of the Referents and associated rights and policies); and an RA will need to provide some level of output or service metadata to support DOI system services. Input metadata will provide some, but not necessarily all, of the service metadata. In some cases, a metadata declaration will itself be a complete DOI system service (for example, "provide an ONIX Product message for this Referent"). These two flows of metadata declarations are illustrated in figure 1.

DOI system policy places no restrictions on the form and content of an RA's input and service metadata declarations, except insofar as input metadata must support the minimum requirements implicit in the DOI Kernel (see below). RAs may specify their own metadata schemes and messages, or use any existing schemes in whole or part for their input and service metadata declarations.

DOI data model policy is concerned with the internal management and exchange of metadata between RAs within the "RA network", and is designed to achieve two aims:

These responsibilities are not mandatory for all DOI names: exceptions are discussed in terms of the requirement for interoperability described in the next section.

4.2 Aims of DOI data model policy

The first aim of DOI data model policy is to promote interoperability within the network of DOI name users. It does this by providing ways of achieving "semantic compatibility" between different RAs described in this chapter.

Standardization of any kind is driven by a need for interoperability. If an RA is issuing DOI names for Referents for use within a private domain where that RA is able to command all aspects of metadata gathering and output, then it has no need for standardization or conformance with DOI data model obligations. The RA will lay out its schemas and declarations, and its providers and users will, hopefully, conform to them. Such a situation is described as restricted use of the DOI system, and applies typically where an organization becomes an RA for the specific purpose of issuing DOI names for use only within its own private organization.

However, such isolation is unusual. Normally, when a DOI name is issued to a Referent, one fundamental assumption may be made about interoperability: the RA or the Referent provider may wish (now or in the future) that the DOI name should be available for use in services provided by other RAs. For example, where several RAs are issuing DOI names to journal articles from different publishers, it is likely that some RAs and publishers will want their DOI names to be included in journal-related services supported by other RAs.

In a similar way, many RAs will want DOI names issued by other RAs to be available for inclusion in services they themselves are providing. Such interoperability is one of the principal benefits of the DOI system.

As the RA network grows, such requirements are emerging, and where specific opportunities do not yet exist they are anticipated. In such circumstances neither the RA nor the Referent provider wishes to issue a second DOI name for the Referent, nor to provide and capture the input metadata all over again from its source.

In addition, some DOI system services may not, in future, be the direct responsibility of RAs. Any service provider making use of DOI names issued by different RAs under different Application Profiles will be faced with the question of metadata interoperability.

Any DOI name which is intended for interoperability — that is, which has the possibility of use in services outside of the direct control of the issuing RA — is subject to DOI data model policy. The aim of metadata interoperability can therefore be expressed in these two objectives:

The first objective is dealt with by the DOI Kernel, and the second by the interchange provisions of the RMD and DD.

The second aim of DOI data model policy is "to ensure minimum standards of quality of administration of DOI names by Registration Agencies, and facilitate the administration of the DOI system as a whole". This aim may also be seen as supporting the first aim of interoperability, but it specifically addresses the need to ensure that a prospective RA is competent to issue DOI names responsibly and that ambiguous DOI names do not enter the network.

The policy provides a simple test of an RA's competence: the ability to make a DOI Kernel Declaration, which requires that the RA has an internal system which can support the unambiguous allocation of a DOI name, and is fundamentally sound enough to support interoperability within the network. In addition, data model policy requires that RAs maintain a record of the date of allocation of a DOI name, and the identity of the registrant on whose behalf the DOI name was allocated.

The DOI data model policy also exists to support the future development of mechanisms for facilitating the administration of the DOI system as a whole. This might be done, for example, through the use of terms registered in the Data Dictionary as types, to classify DOI names, services or application profiles.

4.3 DOI metadata

An identifier such as a DOI is of no value without some related metadata describing what it is that is being identified. The DOI's approach to metadata has two aspects:first, the DOI standard mandates a particular minimum set of metadata (the "Kernel" metadata) to describe the referent of a DOI name, supported by an XML Schema; secondly, to promote interoperability and assist RA's in the creation of their own schemas the IDF provides a Data Dictionary or ontology of all terms used in the Kernel, and other terms registered by Registration Agencies, and supports a mapping tool called the Vocabulary Mapping Framework. These resources are described in this section.

The "DOI Kernel" is a minimum metadata set with two aims: recognition and interoperability.

"Recognition" in this context means that the Kernel metadata should be sufficient to show clearly what kind of thing which is the DOI referent (by various classifications), and allow a user to identify with reasonable accuracy the particular thing (by various names, identifiers and relationships). These two are complementary, for it is possible to know that something is (for example) a movie or a DVD without knowing that it is "Casablanca", and vice versa. Recognition is required for the discovery of referents, and also to provide information to a user when a referent is discovered, whether by intent or accident. The user of metadata may be a person or a machine. The structure of the Kernel is often but not always sufficient to provide a unique description of a referent ("disambiguation"), and further specialized metadata elements may be required in some cases. A unique description can in fact always be achieved by adding additional descriptive text to a referentName, but this is not a satisfactory way if the additional text is being used in place of a formal classification, measurement, identifier, time or other structured contextual metadata, as it undermines the second goal of interoperability.

"Interoperability" in this context means that Kernel metadata from different DOI Registration Agencies may be combined or queried by the same software without requiring semantic mapping or transformation. Interoperability is achieved when data elements or their values are common to diverse metadata schemas. The Kernel provides this directly by mandating a common set of core elements and classifications, but this of course supports only limited interoperability.

The assignment of a DOI name requires that the registrant provide metadata describing the object to which the DOI name is being assigned. At minimum, this metadata shall consist of a DOI Kernel Metadata Declaration (also known as the DOI Kernel). A specification of data elements (with sub-elements, cardinality, etc.), current allowed values and XML expression is maintained by the IDF (the ISO 26324 Registration Authority).

The elements of the DOI Kernel are described in Tables 4.1 and 4.2, which are based on tables B.1 and B.2 of ISO 26324. Note: the tables below may contain additional terms beyond those stated in ISO 26324, but only for terms which are open lists for which new items may be registered; the tables below are therefore fully compatible with ISO 26324. An XSD (XML schema) for the DOI Kernel is maintained by the IDF. (See the DOI Kernel XML Schema page for the current version's Release Notes and link to the schema, and also DOI Kernel XML Schema Policy Notes.) This schema contains some additional sub-elements for the elements.

Table 4.2 shows the basic administrative elements in a DOI Kernel Metadata Declaration. These elements relate to the issuance of the DOI name and to the registration record itself.

For other elements and sub-elements beyond the DOI Kernel, values as needed may be developed. Such value sets shall be registered in the data dictionary under the responsibility of the IDF (the ISO 26324 Registration Authority) in order to facilitate the integration of DOI data from different sources by a common application.

Kernel element(s)	occurs	Description
DOI name	1	Specific DOI name allocated to the identified referent.
referentIdentifier(s)	0-n	Other identifier(s) referencing the same referent (e.g. ISAN, ISBN, ISRC, ISSN, ISTC, ISNI). This element contains a type element appropriate to the primaryReferentType. The schema at present recognises a creationIdentifierType and partyIdentifierType, which are open lists for which new allowed values may be registered.
referentName(s)	0-n	Name(s) by which the referent is usually known (e.g. title). This element contains a type element appropriate to the primaryReferentType. The schema at present recognises a creationNameType and partyNameType, which are open lists for which new allowed values may be registered. This element also contains a language element, for which the allowed value list is the ISO 639-2 code list.
primaryReferentType	1	The primary type of the referent (e.g. creation, party, event). This is an open list; new primaryReferentTypes may be registered.
structuralType	1	The primary structuralType of a referent. For creations, there are four mutually exclusive creationStructuralTypes (physical, digital, performance, abstraction) that allow classification according to overall form. Where structuralTypes may be contained within one another, the referent's structuralType is defined by the overall form [e.g. a CD (physical) may contain files (digital) which contain recordings of performances of songs (abstractions)], and elements of content can be further classified if necessary under referentType. For parties there are three mutually exclusive partyStructuralTypes (person, animal, organization). These lists are closed.
mode	0-n	For creations only, the principal sensory mode(s) by which a referent is intended to be perceived (audio, visual, tangible, olfactory, tasteable, none). Mode identifies only the principal intended modes of perception; most physical resources are perceivable with all five senses, but some of these perceptions may be trivial. For example, a printed book may be touched or smelled, but these are supplementary or incidental to visual mode, the intended function as a content carrier. For a Braille book, however, tangible would be a principal mode. This list is closed.
character	0-n	For creations only, a fundamental form of communication in which the content of a referent is expressed. There are four values: music, language, image, other. This list is closed.
referentType	0-n	Specification of type(s) of referent for parties: author, composer, book publisher, library, university, financial institution, film studio. For creations, the abstract nature of the content of a referent, irrespective of its creationStructuralType, is typically described by creationType, which may be extended as needed to include format and genre elements (for example: audio file, scientific journal, musical composition, dataset, serial article, eBook, PDF). For parties, referentType is a role with which the party is associated and is described by associatedPartyRole (for example: Composer, Author, BookPublisher, JournalPublisher). This is an open list; new referentTypes may be registered.
linkedCreation	0-n	For creations only. Another creation with which a referentCreation is associated. This element contains a creationRoleToCreation element, which is an open list for which new allowed values may be registered.
linkedParty	0-n	For parties only. Another party with which a referentParty is associated. This element contains a partyRoleToParty element, which is an open list for which new allowed values may be registered.
principalAgent	0-n	For creations only, the entity or entities principally responsible for the creation or publication of the referent. This element contains an agentRole element which specifies the particular role played (for example: Creator, Author, BookPublisher). This is an open list for which new allowed values may be registered.
dateOfBirthOrFormation	0-1	For parties only, the date of birth (for an individual or animal) or formation (for an organization) of the referentParty.
dateOfDeathOrDissolution	0-1	For parties only, the date of death (for an individual or animal) or dissolution (for an organization) of the referentParty.
associatedTerritory	0-n	For parties only, a territory with which the referentParty is associated (for example, a territory of birth, nationality or residence). The allowed value list is the ISO 3166a2 territory code list.

Table 4.2: Administrative elements of the DOI Kernel Metadata Declaration

Kernel element	occurs	Description
registrationAuthorityCode	1	Code assigned to denote the name of the agency (authorized by the ISO 26324 Registration Authority) that issued this DOI name.
issueDate	1	Date when this DOI name was issued.
issueNumber	0-1	Number or other designation associated with the specific version of the DOI Kernel Metadata Declaration

Registration Agencies are expected to ensure that, at a minimum, a DOI Kernel Metadata Declaration is made for each DOI name issued. This may be done in two ways: either a Declaration can be made using the DOI Kernel XSD, or (more usually) the elements of the DOI Kernel can be incorporated into a wider metadata schema issued by the Registration Agency.

A Registration Agency has the option of not producing DOI kernel metadata unless asked, i.e. it may convert on demand from an internal representation.

The minimum set of metadata a registrant should be concerned with is the minimum that will meet its business requirements, not the technical minimum of the Kernel which will always be much smaller. The Kernel schema makes very few data elements mandatory. A minimum set is a necessary but not sufficient requirement in considering the question of what data a registrant may need to communicate to supply chain partners.

A "schema" includes any item of software or documented set of metadata elements designed for a specific purpose to support the use of DOIs. Typically this may be an XML schema, and RDF schema or a set of defined vocabulary terms for use in some process. There is increasing consideration among RAs of the benefits of implementing some common machine-readable DOI metadata schemas, especially as "Linked Data". There are currently two DOI metadata schemas: The DOI Data Dictionary, including Allowed Value Sets for use in messages; and The Metadata Kernel. The Data Dictionary is published in full in the Member Section of the IDF web site. The Kernel is published as an XML schema on the IDF web site.

Many Kernel elements have values determined by controlled vocabularies (or "allowed value sets"). These are highlighted in bold in the table above. Some of these lists are closed, which means that no values can be added to them by RAs:

creationStructuralType
partyStructuralType
mode
character
language (ISO 639-2)
territory (ISO 3166a2)

primaryReferentType
agentRole
creationType
creationIdentifierType
creationNameType
creationRoleToCreation
associatedPartyRole
partyIdentifierType
partyNameType
partyRoleToParty

RAs may add new values to these lists by registering them in the DOI Data Dictionary (see below).

Authority to make changes in the existing DOI schemas lies with the DOI-RATech group. Any member of that group may make proposals for updates to a schema, or for introducing a new schema, at any time. Implementing the changes will be the responsibility of IDF's selected technical provider as managers of the schemas, who will acknowledge each proposal and give an estimated deliverable time according to the scope and complexity of the proposed changes. For routine changes this will normally be no more than two weeks. The procedure is:

All elements and allowed values used in the Kernel are included in the DOI Data Dictionary, a hierarchical ontology created to support the orderly development of DOI metadata. The introduction to the Dictionary contains further information on its scope, structure and maintenance.

Terms will be added to the dictionary at the request of any Registration Agency by the modification of the Metadata Kernel and/or its Allowed Values, or the publication of other DOI message schemas in addition to the Metadata Kernel. Any Registration Agency may add new values to the open Allowed Value Sets by registering them with the IDF.

ISO 26324 states that the data dictionary used as the repository for all data elements and allowed values (the items which may be used as values of each element) used in DOI metadata specifications shall enable the definition within an ontology of all metadata elements to be available to all registration agencies, and provide the mappings to support metadata integration and transformations required for data interchange. If desired, metadata may be consolidated for a specific service; in this case, the data dictionary shall provide the data mappings such that the consolidated metadata be presented as if from a single set. All allowed values used by a registrant in Kernel Metadata shall be registered in the data dictionary.

The DOI Data Dictionary is implemented and maintained as a managed namespace within the Vocabulary Mapping Framework (VMF).

Users need not understand the underlying concepts and construction of the Data Dictionary in order to make use of it. Key features of the dictionary are:

A fundamental role of the IDF is to provide assurance to users that the work has been peer-reviewed, tested in practical implementations, and is based on sound principles. The methodology of the data dictionary has been validated against the W3C ontology language OWL-DL. The data dictionary uses an underlying ontology widely accepted and used in a variety of major metadata schemes, having its origins in the indecs (interoperability of data in e-commerce) framework, an influential multimedia metadata project (1998-2000) by groups from the content, author, creator, library, publisher and rights communities, which pioneered a model of event-based metadata as a solution for integrating digital transactions. See the factsheet "The indecs Framework".

The IDF supports and recommends the Vocabulary Mapping Framework (VMF) to promote interoperability between Registration Agency schemas, and other schemas and ontologies from outside the DOI domain such as ONIX or DDEX message standards. The IDF hosts the VMF web site and is also part of the governance structure of VMF. VMF is a downloadable tool that provides support for semantic interoperability across communities by providing extensive and authoritative mapping of vocabularies from major content metadata standards, to support interoperability across communities.

The VMF is an expansion of the existing RDA/ONIX Framework into a comprehensive vocabulary of resource relators and categories, a superset of those used in major standards from the publisher/producer, education and bibliographic/heritage communities (CIDOC CRM; DCMI; DDEX; DOI; FRBR; MARC21; LOM; ONIX; RDA).

IDF has supported the development of VMF, encourages its use by RAs in the mapping of kernel and non-kernel metadata elements to other schemes, and is happy to facilitate discussion with the VMF technical team.

The IDF recognises that the automated integration of metadata is the key to realising the full potential of the DOI as tool for digital commerce and culture. This is also the underlying objective of the Semantic Web and "linked data": that the Web should be seen as a medium for structured, interlinked and machine-processable information, as much as, in its current form, a network of documents presenting the information for human consumption.

The tools described here (the Kernel declaration, the DOI Data Dictionary and the VMF) exist to provide a basis of good practise and a start point for the integration of metadata for different DOI referents. Initiatives such as Linked Open Data provide further essential infrastructure, but only in technology and syntax: they do not provide solutions at the level of shared meaning ("semantic alignment") for the automated integration of different datasets which can allow services from different RAs and other parties to interact fully without human intervention or a plethora of one-to-one "silo" solutions.

The key to this is the development of well-structured metadata schemas and of services which make use of the semantic mapping capabilities of tools such as the DOI Data Dictionary and the VMF. IDF will provide support to its RAs where they choose to co-operate in the development of such services.

4.4 Metadata requirements, ISO 26324

ISO 26324, the ISO specification of the DOI system, describes the following features and requirements of DOI metadata: The object shall be described unambiguously and precisely by DOI metadata, based on a structured data model that enables the referent of a DOI name to be associated with metadata of any desired degree of precision and granularity to support identification, description and services associated with a referent. This is designed to do the following.

EXAMPLE Instead of treating sound carriers, books, videos, and photographs as fundamentally different things with different (if similar) characteristics, they are all recognized as creations with different values of the same higher level attributes, whose metadata can be supported in a common environment.

Metadata describing and identifying the object to which the DOI name is being assigned shall be recorded promptly and accurately. Data elements and allowed values used in DOI metadata specifications shall be placed in a repository to facilitate interoperability across selected existing schemes. The data dictionary shall be used as the repository for all data elements and allowed values. The metadata shall meet the minimum requirements of the DOI Kernel Metadata Declaration.

4.5 Underlying ontology

The DOI data model is built on a contextual ontology approach shared with many other applications (See Chapter 1, Introduction, Section 1.6.5). The IDF ensures that the data model is maintained and made available for further extension and application; RAs and application developers do not need to access the contextual ontology in order to use the DOI system, though they may do so if they wish; an illustrative graphic of the high level concept model is provided here and further documentation is available as part of the Vocabulary Mapping Framework and related materials. Please consult the IDF for further information.

4 Data Model

4.1 Overview of the DOI® data model

4.2 Aims of DOI data model policy

4.3 DOI metadata

4.4 Metadata requirements, ISO 26324

4.5 Underlying ontology

4.1 Overview of the DOI^® data model