Physiome ontologies and mark-up language standards

To link the rapidly growing knowledgebase of biological data into a physiome modelling framework, formal vocabularies need to be developed to reduce the growing heterogeneity of terms. This is especially important as models of physiological processes are developed that span multiple spatial scales (genes and proteins to cells, tissues and organs) and incorporate the parameter changes associated with disease. A formal and standardized representation of custom data structures in the many application-specific databases is also needed to provide a common interface between them. Standards must be developed for the representation of both experimental data and mathematical models of physiological processes. Ontologies that represent semantic descriptions of modelling concepts make the modelling environment richer and unambiguous. They also promote the integration of data from ontologies and databases in other areas of biological research and the building of software tools that interpret and use this information.

Roles and representation Modelling biological systems involves at least three key roles:

  1. Biologist – represents biological systems using terms from the biological domain.
  2. Mathematician – represents the mathematical formulations of the biological system.
  3. Computer scientist/engineer – helps develop representation languages that represent mathematical formulations as structures interpretable by software, for example abstractions of biological and mathematical entities, mathematical relationships and rules for their interpretation.
To provide an effective interplay between these roles, we need particular tools for representing and interpreting the data generated. Producing these representations is one of the roles of the computer scientist/engineer, i.e. to provide tools and structures that not only provide for the biologists and mathematicians, but also provide for the machine processing of data in simulations, inference engines, and databases.
One set of tools the Physiome group is developing are visual editors. Visual building blocks and interfaces are a very natural way for biologists and mathematicians to build concepts, navigate libraries, and interpret models (Fig 1.). To develop a framework for building toolsets, the computer scientist needs to develop machine interpretable representations of what modelers are describing. To do this, we have been using the representation languages :
  1. "CellML": – a modelling description language.
  2. OWL – an ontology representation language specified by the W3C web Ontology Group

Our integration of ontologies into the physiome project is to provide an unambiguous, and machine interpretable, representation of concepts across the biological and mathematical domains of modelling, helping us to communicate biological models through tools for building, sharing, interpreting, simulating, and visualizing them. CellML, has been developed by the Auckland Physiome group over that last 4 years, and aims to represent models at the cellular level. While a full description of the language is beyond the scope of this paper, it is useful to touch on what we have gained from it. The CellML language itself is a set of constructs that have elegant interpretations within both the computational simulation domain and the object oriented programming domain. As a modeling description language, it is generic enough to represent mathematical descriptions of biological models at any level, and not just cells, so serves as a base for us to develop a more generalized modeling representation environment. Our evolution of such an environment so far includes 1) the integration of ontology data, which provides a machine interpretable pathway through the roles of modeling, and the further development and 2) the integration of "FieldML": which enables us to represent structural and continuum based information about biological and physical entities.

To illustrate an example of the use if CellML, it is useful to briefly describe a modeling project currently being done in the Auckland group( Nickerson, 2003). This project models cardio electromechanics by integrating models from the cellular, tissue and organ levels. Specifically, the modeling couples electrical events from the cellular level to the propagation of electrical excitation in myocardial tissue and the contribution to finite elasticity of myocytes due to active contraction. The models at the cellular level are represented in CellML and integrated into the continuum models built using the CMISS(reference?) framework which is able to process CellML data. Different cell models can be swapped in and out in a plug and play fashion, allowing different cellular theories to be tested within the electromechanics model Cell models used so far in that project are Nobel’s (1998) model of the guinea-pig ventricular cell(Nickerson, 2001) and a modified version of the "FitzHugh-Nagumo Simplified Cardiac Myocyte Model.":

Nickerson, D. 2003 : In Proceedings of World Congress on Medical Physics and Biomedical Engineering, August, Sydney, Australia.

Nickerson, D, P. Hunter, N. Smith 2001 : Phil. Trans. R. Soc. Lond. A, 359, 1159-1172.

Building and integrating ontologies

Ontologies are a vehicle to providing unambiguous

descriptions of terms and their relationship to one another. To a computer scientist, they provide a formal framework for describing the properties and relationships of concepts that have both a formal logical foundation and a structure amenable to machine processing, interpretation and sharing. To a biologist or modeller, ontologies provide a thesaurus and structure for understanding and binding terms, ideas, data sets, and visualizations, etc. Many different groups are constructing ontologies for various biological domains. One approach to integrating ontologies from different biological domains would be to generate a large composite ontology. However, this is not the intention of the IUPS project’s ontological framework because the biological ontologies that currently exist do not form pieces of the same puzzle – they may have biology in common, but it usually stops there. There is no currently agreed core framework or methodology that can be used to guide the development of compatible domain specific ontologies, but there are efforts underway to promote such a platform. The Unified Medical Language System (UMLS), for example, is attempting to bring together various ontologies from different domains into a composite ontology that fully integrates these knowledge bases (Gangemi, 1999). Q4 The current view on the IUPS project’s ontological framework is shown in Fig. 2. Some new ontologies are being built from scratch while some existing ones will need to be extended and integrated as both a common framework and data source (i.e. a composite approach). The focus at present is to describe constructs for interpreting our computer based model representations within the biological and mathematical levels of modelling. The domains of modelling theory, and the ML library domains (Fig. 2) capture representations of mathematical relationships, model architectures, and component structure (both physical and abstract).

Ontologies within the data, simulation, and visualization domains provide a top level interface to the resources they describe. The hierarchy of modelling shown earlier describes levels at which a modeller thinks about biological terms, for example – a particular organ or cell, a particular process such as ion transport. These processes and entities are concepts within domains of biology that already have databases and associated ontologies. Instead of defining one particular interpretation of these concepts we can use these other ontologies directly to describe any biological aspect we are referring to in a particular model or ontological concept in our domain. One of the necessary aims of groups such as TAMBIS (Transparent Access To Multiple Bioinformatics Information Stores;, BioPAX (Biological Pathways Exchange; and PSI Protein Standards Initiative 20 Systems Biology Markup Language; is that they work together to ensure that their biological concepts are compatible (Fig. 3). In the area of biochemical pathways, the CellML developers are working closely with BioPAX and SBML20 to establish the foundations for binding cellular domains. An example use-case of such a binding is a pathway of inference starting at concepts in the BioPAX database and ending in selections of models from the CellML database.

Relevant figures are :

  • "levels of modelling":LevelsOfModelling.pdf
  • "overview of ontology domains":Overview.pdf
  • "use-cases":Figures.pdf

Use-case 1 -- A biologist, using an interface to the BioPAX ontology, locates the cAMP/PKA signalling cascade that participates in the regulation of l-type calcium channel activity. From this concept they locate CellML models that describe this system, and are able to run simulations, manipulate these, and visualize their behaviour.

Use-case 2 -- A biologist locates l-type calcium channels through an anatomical navigation interface. From here they can investigate the 3D structure of the channel, its physiological function, or publications that relate to it. Each of these steps helps to gather or filter a set of models with which they can continue with simulation and visualization.

Use-case 3 -- A biologist with a protein domain motif, perhaps with identifiers from the Gene Ontology or protein interaction databases, obtains a set of models that refer to this motif. From here they can look at 3D structures, physiological function, or visualizations of its behaviour in various models.

Use-case 4 -- A biologist with a data set wants to find a model that could help them interpret data from experiments. Using navigation or query interfaces, they can find a set of models that contain the correct entities, or describe the appropriate physiological process, or use particular modelling theories. From here they enter an iterative process of model fitting and system identification, i.e. reducing the set of models to those that provide useful levels of accuracy. The parameter data sets and the raw data sets themselves can be submitted for peer review to be included in the repositories for other people to use.

Use-case 5 -- A modeller has located a particular model. They are able to run it in a simulation, visualize its behaviour, interpret the mathematical theories it was built from, and then edit it in a model editor. They are able to submit annotations to the original model, or submit new models for peer review to be included in the repositories and ontologies.

Use-case 6 -- A modeller starts with a publication, obtains a set of models that describe both the publication and unpublished models of the same processes. They can view comparisons of these models that highlight the similarities and differences in architecture and modelling theories used by the modellers who created them.

Use-case 7 -- A modeller has a particular goal in mind, in this case, coupling their model to models that describe systems at a finer physiological scale to theirs. They can find a concept of coupling scales in the navigation interface that interfaces with the modelling theory ontology. From here they see mathematical systems or examples for coupling between scales, and 20 through these select actual models that implement these. They now select subsystems from the library, or make up their own, and have the option of selecting model templates that help them to couple the subsystems into their model. For example, selecting various subsystems that describe the signalling pathway leading to the activation of l-type calcium channels, and integrating these into their continuum model that may couple a spatial variation of activation of β1 and β2 adrenergic receptors and the resulting spatio-temporal propagation of activation of the muscle.