RequirementsDiscussion2

This document pre-dates this wiki development bowl, so it wasn't intended to make reference to other pages here.

What this is about

The aim is to develop tools for CellML and the Physiome project. While there are some that are in development, the process has so far highlighted two areas that need more attention:

  1. Use-cases. The requirements for various tools has been refined over time as we get a better idea on what is needed. The initial requirement "to visualize CellML models", "evolve physiome language/data systems", and "to design databases" has been pulled out into many different use cases we wish to solve.
  1. As we begin to implement particular solutions, we come across areas in the core architectures that need to be 1) re-addressed, or 2) researched and defined. This has been the focus of CellML language evolution, and research into ontology languages and meta-modelling architectures.

Overview of general requirements

Here I give a small breakdown of the most important refinements of the general requirements.

Rendering CellML models

This has been broken down into

  1. interpreting components and variables as biological entities, i.e. : associating these constructs with some biological type.
  1. representation language for particular states. This would represent values for parameters such as flux, concentration, potential differences etc. This becomes part of the thread for representing simulation data, and intermediate model representation. A result of this representation language is a serialized form of renderings that can be associated with a model. In a more compact form we are saying "a rendering of the mathematical model(i.e. a cellml file) to a rendering of various biological states that the mathematical model might happen to be encompassing."
  1. integration of structure constructs for use in 1D, 2D, and 3D interfaces. FieldML is an obvious mindset to start in, since it is used to represent 3 dimensional structures of anatomy. However, we will not want to influence the development of FieldML at all - its intention is not for solving our rendering requirements. There is likely to be some other intermediate model that uses FieldML constructs in those places that map directly to use of those structures.

Physiome language and data systems

  • public and editorial interfaces to physiome databases.
  • databases include
    • hierarchies/graphs of concepts
    • images, cmiss files, descriptions etc - as already exist.
  • we have settled on the use of ontologies to represent concept relationships(properties), and specifically, we have chosen the OWL-DL language.
  • interfaces for public and reviewers is through web interfaces. Ontology editors can use a web interface or ontology editing tools.

Database design

At the moment a model comes from a serialised form in a 'cellml/xml' file, but once we start using database repositories then the main interface will likely become various interfaces, such as db queries, RPC calls, exchange syntaxes(such as 'cellml/xml'). The use cases for where and how to obtain or add and modify a model are necessary. This use-case suggests we should start stepping outside of the XML mindset CellML is currently in; here XML exchange syntax simply becomes another requirement. I think we should be thinking within a layered modelling architecture with an abstract syntax (see later).

Another consideration with this database requirement is the entry of the ontologies, i.e. these are themselves a strcuture and langauge for databases with inference capabilities. An example would be RDF query languages used with a database of individuals defined in an OWL ontology.

We need to sort out the requirements of a database, these probably include answers to

  • can represent the form of the language used to define the data structures so that no there is no loss of semantic value when the data is interpreted. This is not so clear like this, but imagine that a particular language, say CellML, has semantics defined in an ontology or similar framework - it is not necessary to also include representations of these semantics, but instead maintain the constructs of CellML and the rules that relate them.
  • do query languages such as SQL and OODB(usually via path query systems) have a relevant interpretation for our languages.
  • interfaces for adding, removing, finding, navigating, annotating, updating - etc - i.e., what are the things we want to be able to do with it
  • maintenance - backups, mirrors, operating systems etc

Where this has taken us

CellML typing

A general requirement is to bring a sense of typing to CellML. Initially proposed w.r.t rendering, but is really part of the bigger requirement of building in "levels of modelling", "inference interfaces", and "Object oriented Programming(OOP) interfaces". We have ended up with some examples of where this sense of typing is going to be important.

  • rendering - as explained already.
  • binding of constructs into the "levels of modelling".
  • inference through tools such as ontology query facilities.
  • OOP interfaces - such as used by model editor tools, APIs etc.
  • mechanisms for model developers to create library systems and domains of specialization.

The explicit form of this typing, i.e. what the language constructs look like, is not clear yet, but there are a few requirements

  • fits into the CellML language
  • can be interpreted within the OWL-DL framework
  • can be interpreted within OOP constructs
  • allows modellers to have some process for merging type systems.

Ontologies

We are certain that we want to integrate the intention and math of ontologies into our modelling languages environment, but we are not so clear exactly what this will look like on paper - e.g. as structures in a DB, or in serialized XML form.

OWL-DL is the best candidate so far for an ontology language and framework to use. The philosophy and path of the w3c group responsible for evolving it is certainly in line with ours. The integration of the exchange syntax still needs to be thought out some more as it has some implications on the completeness of the current 'CellML/XML' specification(see section on CellML exchange syntax and following requirements).

General requirements so far for ontology

  • We use OWL-DL as our ontology language - i.e. adopting both its language constructs and exchange syntax(RDF/XML).
  • Experts in domains other than computer-science should not need to understand how to build ontologies, but should be intimately involved in the knowledge acquisition process - i.e the process of capturing the information required to build concepts and interfaces for reviewing and annotating areas that need work.
  • A resolution of what the RDF/XML exchange syntax means for the CellML/XML specification. CellML/XML can be interpreted as RDF/XML instances of an OWL ontology. The CellML/XML specifications are sound in this respect, but not complete - i.e. there are ways of writing out the same information in different XML constructs(differences range from minor to major) but still be describing the same RDF graph. Note : the RDF graph is the normative form of an RDF document, whether it is based on a schema extension or not.
  • CellML constructs are to be represented in the ontology, but we need to make sure that their interpretation in the domain of OOP or serialization for exchange with simulation engines is unambiguous - i.e. that we are not interpreting ontological constructs in an incorrect fashion, all interpretation schemas need to be built into the ontology.
  • Ontologies are an option for a database technology, which is a requirement of ours (see database design)

The following is a blurb written for the web-ontology group - they requested a short description of how we are using OWL

What we are doing :

We have developed(and are still evolving) a representation language for the mathematical modelling of biological systems. Two resources for information about this are :

  1. CellML main site – http://www.cellml.org/
  2. IUPS Physiome project - http://www.bioeng.auckland.ac.nz/physiome/physiome.php

Both these aspects of our work will benefit from a formal semantic representation. For the first domain, semantic representation languages help us to provide a formal metamodel description of our language, and therefore a more structured environment and toolset in which to evolve it. If you get into the spec for CellML at all, it’s pretty obvious that constructs such as ‘reaction’ need to be pushed into a different level and also semantic relationship than they currently are in the base language. Within the second domain, i.e. the IUPS Physiome project, it is obvious how a formalization of the concepts there would benefit the project, especially as we integrate Physiome data into toolsets and the modelling language itself.

Why we chose OWL :

  • Based on description logics, which we feel is a natural way for us to describe our concepts and relations.
  • Extends RDF/RDF-S, which we are already using in CellML. Sidenote : I am particularly pleased to see the RDFS(FA) branch that the Manchester group have formed.
  • Is focused on machine operability – i.e. computationally attractive for developing or integrating reasoning tools, interpretation into exchange syntaxes that we already know how to process at various levels.
  • People do seem to be building tools around it
  • Promotes ontology integration and sharing – in this context we will be working closely with BioPAX to integrate their domain of biochemical pathways with ours on mathematical modelling of biological systems. We are starting similar initiatives with other groups – so I guess it provides a common interpretation and communication language for us at many levels, not just databases.

Where are we at with OWL :

  • Still comparing the language architecture of OWL with that of OMG’s levelled architecture that starts with MOF. The viewpoint here is from the software design perspective, I’m not suggesting we are looking at OMG for semantic representation, but do value their perspective on modelling language architectures for software development.
  • Trying to think how one can move “elegantly” through the levels IDE(Modelling tools) <-> OWL AS+S <-> RDF/XML – i.e. the exchange form of OWL is useful for between application data exchange, but seems very messy in terms of this need to break frames down into triples to interpret it and then build them back up again.
  • Exporting OWL from bioengineering modelling tools that we are developing for users to more easily design and define mathematical models of biological systems. These tools aren’t currently ready to be released – but they will be open source when they are.

CellML exchange syntax

Is there a requirement for there to be only one form of an CellML/XML exchange form? This question is a result of trying to resolve the RDF/XML interpretation of CellML. There is a useful argument in that building schemas for a language that can have multiple forms becomes non-trivial, and certainly is the case for the different forms of RDF/XML. Personally, I am finding it hard to find anything positive about this multiple form business, and at the moment it is the problem that most pushes me away from RDF/XML and hence OWL/XML representations.

Aside from this OWL problem, there are some clearer issues with CellML/XML.

  1. We see a more general modelling description language forming that includes ontologies and levels of modelling. CellML/XML would become a subset of this.
  1. CellML/XML has some untidiness for application developers. While this is gradually improving, operability for application developers is one of the most important requirements.
  1. As a result of these two issues, we are required to step back and address this modelling language environment as more than the development of a simple exchange syntax. The introduction of typing and semantics, and a template oriented 'import' construct in the CellML 1.1 specification, leads us towards thinking in similar ways to other language development efforts, notably OWL and OMG(MOF framework). In this realm we are interested in building some formal abstract syntax and semantics within a layered architecture, such as the L4 system that MOF is part of (see section on abstract syntax and formal architecture).

Leveraging existing standards

I think we need to change our view on "leveraging existing standards" a bit. This seems to imply that 1) there is a solution to our problems that is now a standard, and 2) that the syntax, usually XML, is what we should use. I'd rather place emphasis on clarifying the requirements of our modelling representation language(s) and looking to see what groups of people around the world are forging out niches in the various areas. We then look at their workflows and toolsets and think about how either 1) adopting them, or 2) becoming part of them would benefit. Another mistake that is easily made is adopting a standard and misinterpreting it - instead it is more useful to try a standard through some use - e.g. implementing code, solving some problems, and get a feel for whether it serves the purpose we thought it did. Using other tools developed to interpret the standard can give you insight into proper usage and interpretation. Examples : implementing a custom interpreting function of ontological frames without putting in place formal constructs that demonstrate it's soundness and utility. Using xml structures to hold data where they are not intended to - e.g. attribute names that have another level of data formatted into them - 'variable_1' comes to mind.

CellML general evolution

The previous section was mainly concerned with the CellML syntax side of things - but dissolved into some thoughts on language development architectures, so is partly relevant to this section too. Other requirements then in here, aside from the language architecture are

  1. Integration of FieldML from a modelling point of view - i.e. NOT a rendering point of view. Many examples of this, here are some obvious ones that have come up recently :

    1. Continuum modelling of cell models at tissue level - e.g. activation of sheets of myocardial cells
    1. Cancer therapy modelling - i.e. tumours as aggregations of instances of cell models that behave according to various models of drug action on the biochemical pathways of the cells.
    1. Wound healing, again spatial and temporal domains over which some kind of continuum representation of an entity is useful - in this case I am thinking of Dawn Walker and Steve Wood's work where spatial distribution and strucure of the cells is important as is the Ca++ concentration field.
    1. Everything that Andre(David) is doing :-)
  1. Evaluation of the current set of MathML structures we use - are they enough still?
  1. Units ontology - while this is part of the greater intention of ontologies - i.e. specialized domains of types - it is still a requirement to address units again in CellML. I'm not too clear on how this reflects on the current units spec for CellML, I need Poul to elaborate on what he was thinking.

Abstract syntax and formal architecture

I personally feel there is a requirement for a formal language architecture to be established. As mentioned in the exchange syntax section we might benefit from following the L4 system adopted by OMG and now OWL, along with an abstract syntax and semantics specification such as that of OWL. If we are to use OWL for our core description environment, and we resolve the exchange syntax issues, then we may be able to use the OWL abstract syntax and semantics to describe our modelling language environment. There is nothing stopping us adding more layers to the L4 system.

The benefits of this approach is that we

  • have a formal way to introduce and define new language elements
  • show a formal evolution of the language and maintain an environment in which language evolution is understood and expected.
  • can more easily integrate other languages/schemas such as OWL, MOF, OOP, DB-schemas etc.
  • become less constrained by "thinking in XML".

Levels of modelling

We have a vision of "levels of modelling":LevelsOfModelling.pdf . These levels are:

Conceptual -- writing down in English what you wish to represent

Mathematical -- classes, types etc - the process of representing concepts within a useful syntax. We can conceptualize processes, regions etc at this level.

Formulation -- instances - now we have connections in the CellML sense, i.e. explicit relationships between instances.

Computation -- implementation of actual computation - eg : CMISS, SIMULINK - requires import libraries/CellML API.

Perhaps the line between Mathematical and Formulation is a little fuzzy. One of the most important requirements is to enable people to represent what they want in the most natural way to them - i.e. do not restrict them in the way they can conceptualize and realize their models. In the recent Physiome paper I replaced all this with:

Roles and representation

Modelling biological systems involves at least three key roles:

  1. Biologist – represents biological systems using terms from the biological domain.
  2. Mathematician – represents the mathematical formulations of the biological system.
  3. Computer scientist/engineer – helps develop representation languages that represent mathematical formulations as structures interpretable by software, for example abstractions of biological and mathematical entities, mathematical relationships and rules for their interpretation.

To provide an effective interplay between these roles, we need particular tools for representing and interpreting the data generated. Producing these representations is one of the roles of the computer scientist/engineer, i.e. to provide tools and structures that not only provide for the biologists and mathematicians, but also provide for the machine processing of data in simulations, inference engines, and databases.

One set of tools the Physiome group is developing are visual editors. Visual building blocks and interfaces are a very natural way for biologists and mathematicians to build concepts, navigate libraries, and interpret models (Fig 1.). To develop a framework for building toolsets, the computer scientist needs to develop machine interpretable representations of what modelers are describing. To do this, we have been using the representation languages :

  1. CellML – a modelling description language (http://www.cellml.org).
  2. OWL – an ontology reprepresetation language specified by the W3C web Ontology Group

Object oriented programming interfaces

In the CellML language there is a strong emphasis on using computer science based language constructs such as variables, abstraction, encapsulation, interfaces, etc. The result of this are structures that have elegant and useful representations in the programming environment. We need to maintain this mindset, especially now that we are moving into areas such as import and typing. One characteristic of using formal language architectures to define our language is that there is a community of people who have similar requirements to us. i.e.

  • unambiguously define object oriented software architectures
  • remain programming language independent
  • think across all programming applications - e.g. databases, API design, remote object and procedure systems, language evolution, etc.

Given this kind of philosophy, we need to make sure whatever language system we end up with allows us to interpret structures in this context where it is useful or intended to do so. Why this is particularly relevant now is that some methods of building ontology data make it difficult to interpret in a way suitable for defining object oriented constructs. This is discussed a little more in the ontologies section.

Model validation

There are many faces to validating a CellML model. If it is represented in XML we can validate the wellformedness and conformance of its structure. Independent of it's representation syntax we can validate if it is valid according to the rules of the language - such as the dimensions of connected variables, the connectivity between encapsulated components, etc. Following this we move up through the levels of modelling, i.e. we have valid the model within its immediate definition of form, but now we'd like to validate it at the mathematical and biological levels. A couple of examples might help make this clearer :

  1. quasi-steady state approximation, along with its mathematical and biological assumptions.
  1. valid participants in a reaction, and valid representation - i.e. given steady state approximation is ok, is it reversible?
  1. connections between particular biological components considered incorrect
  1. need to add many more examples here

The rules for such validation are expected to be represented in the ontolgy data. Some of these rules will be fundamental principles of math and biology, others may have more to do with maintaining interpretable modelling abstractions, such as the interpretation of a particular membrane compartment that can be re-used in other models.

There are also other axes of validation. Considering many of these models come from publications, there are other potential errors.

  1. the mathematics in the paper was presented incorrectly - i.e. the math did not match the graphs(in so many words).
  1. the mathematics simply does not work

As a result of using models from publications, new ones are likely to be produced that contain perhaps more accurate mathematical constructs, or correct misused constructs. Such changes are necessarily a new model and potentially a new publication, but there is at least a requirement to reference or be able to infer that this new model was based on other particular models.

Tools

Most of the requirements above point towards various kinds of tools being necessary or produced. So far we have :

  1. Modelling languages - i.e. the languages we use for representing the models, ontology data, exchange formats etc.
  1. Model repository - in particular
  • database technologies
  • workflow tools
  1. Interfaces for simulation
  • APIs
    • CMISS
    • Matlab
  • web interfaces for running simulations and viewing results
  1. Visualization of models
  • simple - for navigation, model checking - leads towards the editor toolset
  • graphical - see rendering of CellML requirement
  1. General APIs
  • for programmers
    • C++
    • python
    • java
    • etc

General APIs can be thought of as the most direct representation of the modelling language constructs as programming objects along with some interfaces for I/O, one of which will be reading and writing serialized exchange forms. So far XML has influenced us quite heavily, and so we do think of DOM like interfaces to the language constructs. I think this is a useful interface to mixin at some points, but should not be a defining aspect of the language. If we find constructs such as parent and sibling pointers being useful, and they already seem so with things such as encapsulation, then we should make them part of our primary language structures and not rely on something like XML providing these for us.