Personal tools
You are here: Home scope CellML Scope 20000620
 

CellML Scope 20000620

20000620

This Version:
        http://www.cellml.org/scope/cellml_scope_20000620
Latest Version:
        http://www.cellml.org/scope/scope
Author:
       Warren Hedley (Bioengineering Research Group, University of Auckland)
Collaborators:
       David Bullivant (Bioengineering Research Group, University of Auckland)
       Mark Grehlinger (Physiome Sciences Inc.)
       Melanie Nelson (Physiome Sciences Inc.)
       Poul Nielsen (Biomedical Engineering Research Group, University of Auckland)
       Jeremy Rice (Physiome Sciences Inc.)

Disclaimer

The views expressed in this document are the sole property of Warren Hedley, although it may also include idiotic suggestions made by other people, who don't want to own up. Don't take the information here too seriously - this is not meant to be a binding specification, but is more in the vein of incoherent rambling. Enjoy!

Introduction

The scope of the CellML project can be broken up in two parts. The first thing we need to work out before we proceed any further with CellML language development, is the scope of the language. What kind of objects do we need to be able to describe using CellML? This question must be answered without regard to any specific piece of software.

The second, is the scope of the specification in defining how CellML processors can and should behave. In other words, what should the specification say about what we can expect from an application that calls itself CellML-capable?

Although the scope of CellML may not yet be clear, the focus of CellML is pretty well defined: modelling. CellML is a language for describing cellular models, not cellular biology, although it is expected that as models become more realistic, more and more biology will be introduced.

This document lists the requirements for CellML, from the most basic needs to the complex requirements needed to address particular problem domains that we are interested in. Hopefully, putting these requirements down in writing will help answer the questions of scope. Note that these requirements do not assume an XML serialization of the CellML data model.

Key Requirements and Design Goals

Here are some of the more obvious requirements and design goals, that apply to the whole of the CellML project. The following are a combination of common sense, modifications to the design goals for XML, as listed in Section 1.1 of the XML 1.0 Recommendation, and concepts from the XSLT Recommendation.

  1. The data model must be unambiguous — the structure of the model, the resolution of variables, and the governing equations must be defined unambiguously.

  2. The data model must be complete — the use of names to identify some aspect of a model should be avoided unless those components can be defined and are accessible in CellML syntax. For instance, even though the Hodgkin-Huxley equations are well-known and need not be redefined in every model where they are used, they should be defined somewhere and a reference to that definition used, although software need not resolve those references, but may recognise them.

  3. The data model must be extensible — a means for extending the data model to incorporate new concepts or implementation specific data should be well-defined. The appropriate course of action when extensions are encountered should also be specified.

  4. The data model must be validatable — to facilitate interoperability between CellML processing applications (and models created by hand), the consistency of a model definition must be validatable. In general, this means that the data model is quite explicit and incorporates some redundancy.

  5. The data model and its serialization should be human-legible and reasonably clear. It's anticipated that many models and model components will be initially created by hand.

  6. The design of the CellML data model shall be formal and concise.

  7. The number of optional features should be kept to a minimum, ideally zero.

Model Structure Requirements

The original design of CellML was developed to cope with electro-physiological models (Luo-Rudy style), but it was always intended that CellML should cope with pathway (aka signalling), mechanical and metabolism models, and in particular that the various types of model could be combined. The following list details the requirements for the definition of model structure.

  1. Structure: The data model must include capabilities for the definition of cell structure, where this may be an abstract structure developed for modelling purposes, or an accurate geometric description of the cell and its key compartments. Note that rendering information should NOT be included in CellML — the choice of rendering is implementation specific.

  2. Structure Validation: Assuming that structure is defined using some sort of collection of components and connections, the data model should prevent modellers from defining components and connections in a manner that don't make sense topologically. (Note that if CellML is to be based on a very general modelling framework, this may prove tricky to incorporate.)

  3. Geometry: We must be able to specify the geometry of a component, and define geometric variations of state and processes.

  4. Inheritance: We must be able to specify a parent-child relationship between different model components, where this implies inheritance of all properties (be they state or processes) by the child from the parent. The child must then be able to modify these properties. Some limits on the variables that are available to the child from the parent may also be necessary — i.e., it may be appropriate for a “hide” some of its state and processes.

  5. Multiple Inheritance: Any component can have numerous parents from which properties are inherited. This allows, for instance, the separation of hierarchies for different states. The above requirement for a parent being able to limit the availability of its state must be extended to cope with different limitations for different hierarchies.

  6. Component Inputs: The majority of components will be incomplete in that they require data defined externally in order to function properly. The inputs that a component requires must be clearly specified. In the event that a component is added to a model that can not explicitly satisfy all of that component's requirements, a facility for specifying default inputs will be necessary.

  7. Component Outputs: All variables that are used inside a component must be accessible. (Note that hints might be used to recommend which state variables are important.)

  8. Object Re-use: It must be possible to define any component or part of a component without context, and re-use that definition at any point in the data model where it is appropriate. Optionally, some facility for modifying the definition might be incorporated. (This requirement is a combination of the old CRIM (Component Re-use and Information Modification) and point 1.2.1 from Jeremy's EP Requirements document).

Mathematical Requirements

  1. Systems of ODEs: CellML must be able to define systems of ODEs.

  2. Stochastic Systems: CellML must be able to define mathematical systems with stochastic components.

  3. Algorithms: CellML must be able to define algorithms to, for instance, specify a nonlinear solution process.

  4. Units: All quantities in a model should have units associated with them so that the consistency of a model can be checked. The user must be able to use their own set of familiar units, but must be able to map these into the base and extended set of SI units, which is the minimum unit set that a conformant processor should recognise. Any serialization should provide some way to abbreviate unit definitions, and set units within some range (e.g., all numbers and variables in this equation are specified in millivolts, unless otherwise specified.)

  5. Solution Methods (1): The mathematics of some model components may require a specific solution method or set of solution methods in order to achieve a realistic result. It may be appropriate for components to recommend a solution method, or include information about solution methods that have previously been used. This may be particularly important in cases where different solution methods are needed in different parts of the same model.

  6. Solution Methods (2): A model may incorporate several sets of equations that attempt to represent the same problem in ways compatible with different solution methods. For instance, both ODE and stochastic math may be specified in a component, and the software or user is free to implement the math which it can most easily solve.

Software Hints Requirements

There are many types of information that are not strictly necessary but would be useful to include within the CellML data model. Generally this information comes under the category of what I like to call “software hints” — information that processing software can use to make sense of models, and present meaningful options to the the user.

  1. Context: Modellers may wish to specify the context in which a model component is meaningful. For instance, this might mean saying that a particular channel is only useful within the membrane between the extra-cellular and intra-cellular regions. For interoperability between applications, this can only be a hint, unless the specification defines broad naming conventions.

  2. Variable Scope: Modellers may wish to indicate to the user that certain variables within their model (or partial model) should not be considered important — an example is a temporary variable that is used to store the result of an intermediate calculation.

  3. Variable Role: Modellers may wish to indicate to the user the role that variables play within their model (or partial model). For instance, some variables may be important inputs which a user might want to play with when running simulations, while other variables are important outputs which the user might want to keep track of during a simulation.

Basic Extensions

I can't resist recommending a few extensions to the core CellML focus (model specification), which would greatly improve the interoperability of CellML processing applications.

  1. Model Configuration: In order to use and test a model effectively, the user needs to be able to store the configuration of the model that they used to produce a certain set of results. In general this will be a set of variable values.

  2. Simulation Information: In order to use and test a model effectively, the user needs to be able to store information about a particular simulation - solution method, timestep, etc. This might well become a scripting language with which the user could specify which results to store and graph, and include elements relating to analysis and comparison with experimental data.

  3. Results: Since we're already half way down this path, why not finish up by defining a method for storing the results from a particular simulation.

  4. Rendering: Why not try and standardise the information that is needed to render a particular model. OK, now I'm just asking for it.