Personal tools
You are here: Home scope CellML Scope 20000621
 

CellML Scope 20000621

20000621

This Version:
        http://www.cellml.org/scope/cellml_scope_20000621
Latest Version:
        http://www.cellml.org/scope/scope
Previous Version:
        http://www.cellml.org/scope/cellml_scope_20000620
Author:
       Warren Hedley (Bioengineering Research Group, University of Auckland)
Collaborators:
       David Bullivant (Bioengineering Research Group, University of Auckland)
       Mark Grehlinger (Physiome Sciences Inc.)
       Melanie Nelson (Physiome Sciences Inc.)
       Poul Nielsen (Biomedical Engineering Research Group, University of Auckland)
       Jeremy Rice (Physiome Sciences Inc.)

Disclaimer

The views expressed in this document come from the brain and fingers of Warren Hedley, although they may include opinions voiced by others, who are non-specifically credited as contributors. This document is not intended to be any sort of binding specification, but is a source for discussion. The ideas in this document probably have a very short shelf life, so be sure you are looking at the most recent version of this document.

Introduction

The scope of the CellML project can be broken up in two parts. The first thing we need to work out before we proceed any further with CellML language development, is the scope of the language. What kind of objects do we need to be able to describe using CellML? This question must be answered without regard to any specific piece of software.

The second, is the scope of the specification in defining how CellML processors can and should behave. In other words, what should the specification say about what we can expect from an application that calls itself CellML-capable?

Although the scope of CellML may not yet be clear, the focus of CellML is pretty well defined: modelling. CellML is a language for describing cellular models, not cellular biology, although it is expected that as models become more realistic, more and more biology will be introduced.

This document lists the requirements for CellML, from the most basic needs to the complex requirements needed to address particular problem domains that we are interested in. Hopefully, putting these requirements down in writing will help answer the questions of scope. Note that these requirements do not assume an XML serialization of the CellML data model.

Key Requirements and Design Goals

Here are some of the more obvious requirements and design goals, that apply to the whole of the CellML project. The following are a combination of common sense, modifications to the design goals for XML, as listed in Section 1.1 of the XML 1.0 Recommendation, and concepts from the XSLT Recommendation.

  1. The data model must be unambiguous — the structure of the model, the resolution of variables, and the governing equations must be defined unambiguously.

  2. The data model must be complete — the use of names to identify some aspect of a model should be avoided unless those components can be defined and are accessible in CellML syntax. For instance, even though the Hodgkin-Huxley equations are well-known and need not be redefined in every model where they are used, they should be defined somewhere and a reference to that definition used, although software need not resolve those references, but may recognise them.

  3. The data model must be extensible — a means for extending the data model to incorporate new concepts or implementation specific data should be well-defined. The appropriate course of action taken by processing software when unknown extensions are encountered should also be specified.

  4. The data model must be validatable — to facilitate interoperability between CellML processing applications (and models created by hand), the consistency of a model definition must be validatable. In general, this means that the data model is quite explicit and incorporates some redundancy.

  5. The data model and its serialization should be human-legible and reasonably clear. It's anticipated that many models and model components will be initially created by hand.

  6. The design of the CellML data model shall be formal and concise.

  7. The number of optional features should be kept to a minimum, ideally zero.

Model Structure Requirements

The original design of CellML was developed to cope with electro-physiological models (Luo-Rudy style), but it was always intended that CellML should cope with pathway (aka signalling), mechanical and metabolism models, and in particular that the various types of model could be combined.

In the previous version of the document (which is available in the archive), inheritance was given as a structural requirement. This caused some confusion as inheritance is a word that has become grossly overloaded, meaning different things in different contexts. In this version of the document, an attempt has been made to solve this by better defining the actual functionality that we require.

The following list details the requirements for the definition of model structure.

  1. Structure: The data model must include capabilities for the definition of cell structure, where this may be an abstract structure developed for modelling purposes, or an accurate geometric description of the cell and its key compartments. Note that rendering information should NOT be included in CellML — the choice of rendering is implementation specific.

  2. Structure Validation: The data model must include a way to specify constraints on the way that components can be combined. For example, a software package may define a library of standard components, and wish to specify the way in which they may interact, preventing modellers from combining components in a manner that don't make sense topologically.

  3. Geometry (1): We must be able to specify the geometry of a component, and define geometric variations of state and processes.

  4. Geometry (2): We must be able to specify certain geometric relationships between components. These relationships need have no associated implications (such as property inheritance) and should include at the very minimum:

    • A is inside B

    • A is connected to B

  5. Property Sets (1): Users and software will want to be able to group a set of properties together, and then associate them with one or more components. These properties may include state, processes, and structural constraints. A component may wish to incorporate more than one property set, and in this event we need to provide some mechanism for resolving conflicting identifiers.

  6. Property Sets (2): A modeller may wish to indicate that a component should acquire properties from another component (this is a refinement of the previous point). And the reverse: a component may wish to limit the properties that it makes available to other components.

  7. Component Inputs: The majority of components will be incomplete in that they require data defined externally in order to function properly. The inputs that a component requires must be clearly specified. In the event that a component is added to a model that can not explicitly satisfy all of that component's requirements, a facility for specifying default inputs will be necessary.

  8. Object Re-use: It must be possible to define any component or part of a component without context, and re-use that definition at any point in the data model where it is appropriate. Optionally, some facility for modifying the definition might be incorporated. (This requirement is a combination of the old CRIM (Component Re-use and Information Modification) and point 1.2.1 from Jeremy's EP Requirements document).

Mathematical Requirements

  1. Systems of ODEs: CellML must be able to define systems of ODEs.

  2. Stochastic Systems: CellML must be able to define mathematical systems with stochastic components.

  3. Algorithms: CellML must be able to define algorithms to, for instance, specify a nonlinear solution process.

  4. Units: All quantities in a model should have units associated with them so that the consistency of a model can be checked. The user must be able to use their own set of familiar units, but must be able to map these into the base and extended set of SI units, which is the minimum unit set that a conformant processor should recognise. Any serialization should provide some way to abbreviate unit definitions, and set units within some range (e.g., all numbers and variables in this equation are specified in millivolts, unless otherwise specified.)

  5. Solution Methods (1): The mathematics of some model components may require a specific solution method or set of solution methods in order to achieve a realistic result. It may be appropriate for components to recommend a solution method, or include information about solution methods that have previously been used. This may be particularly important in cases where different solution methods are needed in different parts of the same model.

  6. Solution Methods (2): A model may incorporate several sets of equations that attempt to represent the same problem in ways compatible with different solution methods. For instance, both ODE and stochastic math may be specified in a component, and the software or user is free to implement the math which it can most easily solve.

Software Hints Requirements

There are many types of information that are not strictly necessary but would be useful to include within the CellML data model. Generally this information comes under the category of what I like to call “software hints” — information that processing software can use to make sense of models, and present meaningful options to the the user.

  1. Context: Modellers may wish to specify the context in which a model component is meaningful. For instance, this might mean saying that a particular channel is only useful within the membrane between the extra-cellular and intra-cellular regions. For interoperability between applications, this can only be a hint, unless the specification defines broad naming conventions.

  2. Variable Scope: Modellers may wish to indicate to the user that certain variables within their model (or partial model) should not be considered important — an example is a temporary variable that is used to store the result of an intermediate calculation.

  3. Variable Role: Modellers may wish to indicate to the user the role that variables play within their model (or partial model). For instance, some variables may be important inputs which a user might want to play with when running simulations, while other variables are important outputs which the user might want to keep track of during a simulation.

Basic Extensions

I can't resist recommending a few extensions to the core CellML focus (model specification), which would greatly improve the interoperability of CellML processing applications.

  1. Model Configuration: In order to use and test a model effectively, the user needs to be able to store the configuration of the model that they used to produce a certain set of results. In general this will be a set of variable values.

  2. Simulation Information: In order to use and test a model effectively, the user needs to be able to store information about a particular simulation - solution method, timestep, etc. This might well become a scripting language with which the user could specify which results to store and graph, and include elements relating to analysis and comparison with experimental data.

  3. Results: Since we're already half way down this path, why not finish up by defining a method for storing the results from a particular simulation.

  4. Rendering: Why not try and standardise the information that is needed to render a particular model. OK, now I'm just asking for it.