CellML Scope 20000726
200000726
http://www.cellml.org/scope/cellml_scope_20000726
Latest Version:
http://www.cellml.org/scope/scope
Previous Version:
http://www.cellml.org/scope/cellml_scope_20000621
http://www.cellml.org/scope/cellml_scope_20000620
Melanie Nelson (Physiome Sciences Inc.)
Warren Hedley (Bioengineering Institute, University of Auckland)
Collaborators:
David Bullivant (Bioengineering Institute, University of Auckland)
Kam-Chuen Jim (Physiome Sciences Inc.)
Scott Lett (Physiome Sciences Inc.)
Dave Nickerson (Bioengineering Institute, University of Auckland)
Poul Nielsen (Bioengineering Institute, University of Auckland)
Table of Contents
This document details the proposed scope of CellML. The scope of CellML is the information that should be contained within CellML elements (i.e. elements defined in the CellML Document Type Definitions, or DTDs). Note that information can also be included in a CellML document by including an independent DTD. See section 3 ("Use of Other Languages") for more information about this possibility.
Technical Introduction: CellML Definition Documents CellML will be defined in two documents: the DTD (Document Type Definition) and the specification. The DTD details the allowed syntax of CellML (i.e., element A can have elements B and C, but not elements D or E, as children). A W3C working group is currently defining XML Schema, which will eventually replace DTDs as the method for defining the syntax of XML languages. At some point, the CellML technical working group will decide that it is the correct time to begin using a schema rather than a DTD for CellML. However, this has no effect on the scope discussed in this document.
The CellML specification is the document that explains what a program that claims to be able to read and write valid CellML must be able to do.
Rules defining how CellML uses information determined to be within the scope of CellML may be implemented in the DTD (or schema), the specification, or both.
The CellML language itself should be limited to the following:
Description of the organizational scheme/structure of the model. This includes the organization of a model into components (parts), and how these parts are connected together to create a complete model. Note that this description is intended primarily for use by computer programs that want to be able to read and run a cell model.
Metadata describing the model and its parts, to be used to help users categorize, organize, and retrieve models and model parts. This metadata is intended for both computer programs accessing the model (so that they can search for models and model components based on user queries) and for human users of the model (who might want to view portions of the metadata to evaluate whether a particular model meets their requirements). [Technical note: The most basic set of metadata can be handled with the Dublin Core RDF specification (RDF is the W3C's method for handling metadata. The Dublin Core is a set of specific metadata elements). Other metadata can be handled in a CellML-specific RDF schema or in elements within CellML itself. A CellML-specific RDF schema would be so tightly coupled with CellML that its development can be considered part of the same project.]
Textual and graphical documentation of the model intended solely for display to human users of the model. [For discussion: should this be done with HTML instead?]
When a cell model document needs to contain information that is outside of the CellML scope (such as documentation about the real biological entity that the model represents or data sets produced by running the model), this should be done using separate languages/storage formats. The CellML working group should decide which languages can be legally included in CellML as the need to include different types of information arises. In some cases, the CellML specification can limit the subset of valid information from the separate language or format that can be included in CellML. The inclusion of the MathML DTD in the CellML DTD is an excellent example of this approach. There is no need to redefine an XML representation of mathematics for use with CellML, so the existing MathML specification is used. However, we are currently only using a subset of the mathematics that can be validly represented in MathML. The CellML specification could spell out the exact subset of MathML that an application purporting to support CellML must be able to handle. Another example of a set of information that could be included/linked to in CellML, but not made part of the CellML specification, is sequence-based information about biopolymers (i.e. genes and proteins). There are several competing formats for this information. We could choose to allow the inclusion of one or more of these formats in CellML, as annotations on specific parts of the model.
Note that the other language that handles information could be a language defined elsewhere (such as MathML) or a language that we must define ourselves.
The following are specific points that should be covered when we discuss the above proposal for the scope of CellML. We know that we need all of these types of information to be either included in cell model documents or referenced by cell model documents. None of these types of information are covered by the scope proposal in section 1: it is proposed that all of these types of information be handled in separate languages. We must formally decide whether the information in these points should be included in CellML or if they should instead be handled by another language.
Information about the actual computation of a cell model
Data sets
Ontologies [Note: An ontology is the set of rules that govern what is allowable in a biologically correct model. We anticipate that different types of modeling will need different ontologies. It should also be possible to define a model without using an ontology at all (in effect, the ontology would be inside the modeler's head). However, using ontologies would allow some "short cuts" in the model definition process. Also note: an ontology can be separate from the model whether we choose to express ontology information in CellML or in some other language.]
Information about rendering of models
