Custom Subset Metadata

A specification for metadata for describing customised subsets of CellML elements intended for convenient viewing and editing.
This version:
Last version:
Latest version:

Authors:
Andrew Miller <ak.miller@auckland.ac.nz> (Bioengineering Institute, University of Auckland)

Abstract

CellML provides a mechanism to describe mathematical models. CellML allows models of arbitrary complexity to be created, and models can be combined using the import functionality in CellML 1.1 to create even more complex models. One consequence of this is that CellML models can potentially have massive numbers of variables and even components. While backend tools can process such large numbers of variables on modern computers, this creates a significant problem when designing user interfaces. One principle of good user interface design is that you should present the user with the minimum number of choices possible. This makes it faster for the user to locate the functionality they desire, and so improves user productivity. In order to cut down on the choices of variables, components, or other parts of the model available, however, it is necessary to determine which parts of the model the user is likely to edit, in advance. As CellML has been designed to be domain independent, there is no generic way to do this automatically. Moreover, even if we could do this, different users may put the model to different uses, and so have different preferences as to what they will edit.

It is clear that there needs to be some way to specify which parts of the model users will change the most often, so restricted views of the models can be shown to the user (with an option to show the full model if desired). CellML provides a grouping facility, through the use of the cellml:group element. However, this element is inappropriate for this particular application, because:
  1. Grouping only works for components. However, it would be useful to define sets of variables, mathematical equations, and other elements in the model, in addition to components. This could be overcome by making a revised version of the specification, but this would likely introduce many incompatibilities with existing models.
  2. CellML has been designed to separate data, which is essential to the evaluation of the model, from metadata. Because the information being conveyed here relates to how the model should be displayed, rather than the mathematical interpretation of the model, it is clearly metadata. Therefore, it fits better into the existing CellML metadata (cmeta) framework, as RDF, than it does as a part of the core CellML specification.

 

Prefix

The URI http://www.cellml.org/metadata/custom_subset/1.0# is used as the prefix for all RDF predicates defined in this specification. When describing documents in RDF/XML, it is recommended that the prefix csub be bound to this URI. However, processing software which parses custom subset metadata in RDF/XML form must not assume that this prefix will be used.

Within this document, predicate resources are referred to using the notation of qualified names in RDF/XML, and assuming that the csub prefix is bound to "http://www.cellml.org/metadata/custom_subset/1.0#". For example, csub:subset refers to the resource "http://www.cellml.org/metadata/custom_subset/1.0#subset".

Specification of custom subsets

As described in the CellML metadata specification, the CellML model is associated with the RDF resource with fragment identifier matching the cmeta:id on the model element and the URI base equal to the CellML document URI.

Every CellML model may have zero or more custom subsets associated with them. Such subsets are specified by an arc from the model (subject) to a node(object), with predicate csub:subset. Where more than one subset is present, they should be treated as being independent. In particular, CellML processing software should not automatically take the union or intersection of all subsets available, but instead should prompt the user to choose a particular subset.

Throughout this document, the object node of the arc just described is referred to as the "subset node".

The subset node shall be closed-world node with respect to predicates defined in this specification — that is, it shall be completely specified within the XML file in which it is defined. If a user wishes to add additional details defined in this specification to a different XML document, they must not attempt to define new arcs with the subset node as the subject. Instead, they should define a new custom subset with the additional information. However, it is acceptable for predicates which annotate the custom subset (for example, to describe its utility), rather than define it, to be specified in a different XML file.

Every custom subset shall have a name. This name shall be specified by an arc from the subset node to a plain literal(containing the name) with predicate cs:subsetName. The contents of this literal shall be a description of the subset, in the user's language. There are no constraints placed on the form that the literal can take, and so it may be entered by and displayed to the user as text.

Specification of subset members

Every custom subset must define which parts of the CellML model belong to the subset. To achieve this, every subset shall have exactly one arc from the subset node (subject) with predicate csub:members. The object of this arc shall be a resource, with the semantics of an RDF collection (as defined in section 3.3.3 of the RDF Semantics specificaton). The collection shall have exactly one head (which may be rdf:nil, in the case that the subset is the empty set), shall have no forks or loops, and shall be explicitly terminated with rdf:nil. The non-terminal rdf:rest arcs shall have, as their object, resources corresponding to the cmeta:id attributes defined on elements in the CellML model.

The objects of the rdf:rest arcs in the collection shall form the members of the custom subset. In addition, CellML processing software may choose to infer an ordering of the set members. CellML processing software which infers an ordering should treat members earlier in the set as being more significant to the user than members later in the set.