EGF pathway model from Bhalla and Iyengar | PDF |
Table of Contents
In 1999 Upinder S. Bhalla and Ravi Iyengar published a paper discussing the emergent properties of signalling pathway networks. To develop these networks, they modelled individual pathways and compared those results to previously published results before combining the individual pathways to create networks. They only used mechanisms that had already been experimentally described when making connections between pathways. Their paper, the full reference for which is below, describes the epidermal growth factor (EGF) pathway.
Emergent properties of networks of biological signaling pathways. Bhalla US, Iyengar R. Science 1999 Jan 15; 283(5400); 381-7
The CellML description of the EGF pathway, one of the networks modelled by Bhalla and Iyengar, can be downloaded in various formats as described in the section “Download This Model”.
The CellML description of the EGF pathway model and documentation conforms to the 11 August 2001 version of the CellML 1.0 specification and the 18 May 2001 CellML Metadata 1.0 specification.
The model description itself is intended to demonstrate the best practices for the definition of reaction models using CellML, as intended by the authors of the CellML specification.
The metadata included in the CellML description represents the recommended best practice for metadata definition according to the 18 May version of the CellML Metadata specification. This version has a status of Research Summary, and is neither complete nor final -- no guarantees are made that metadata in this model will conform to the final CellML Metadata specification.
Figure 1 shows a possible rendering (pathway diagram) of the EGF pathway. In this rendering the reactants and products are shown as rounded rectangles on opposite ends of bidirectional arrows (representing reversible reactions) or unidirectional arrows (representing irreversible reactions). The action of a catalyst on a reaction is depicted by dashed lines joining the reaction in the direction in which they serve as catalysts.

In CellML, models are thought of as connected networks of discrete components. These components may correspond to physiologically separated regions or chemically distinct objects, or may be useful modelling abstractions. This pathway model has twenty-five components representing chemically distinct objects (fifteen chemical species and eight reactions) and two components defined purely for modelling convenience: the EGF_subsystem component, which will encapsulate the portion of the pathway that results in the EGF_EGFR complex, and another “abstract” component which will store variables such as time. The CellML rendering of the EGF pathway model is shown in Figure 2 (the different shapes in the diagram are explained in the notation guide). The action of a catalyst on a reaction is again represented by dashed lines.
The root element of the XML document containing the CellML definition of the EGF pathway model is the <model> element shown in Figure 3. (At this point, those not familiar with XML may want to consult the quick introduction to XML for help with understanding XML terms.) The <model> element defines a name attribute that allows this model to be unambiguously referenced by other models. For instance, this would be necessary if this model were to be combined with other models or partial models to create a larger model. It also defines an id attribute in the CellML Metadata namespace (which is mapped to the cmeta prefix). This id attribute is of XML type ID and may be used to attach metadata to this element, as discussed in the section “Model metadata”.
<model> element shown above.
The <model> element defines three XML namespaces. The first is the CellML namepsace, which is set to the default namespace and also mapped explicitly to the cellml prefix. The default namespace declaration (which looks like an xmlns attribute) means that the <model> element is itself in the CellML namespace. As such, it can be identified by any namespace-aware XML parser as a CellML element. In addition, any child elements that are not placed in another namespace, or that do not declare a new default namespace, will also be in the CellML namespace. Mapping the CellML namespace to the cellml prefix simplifies the declaration of elements and attributes in the CellML namespace later in the file. For instance, the cellml prefix can be used to place the units attribute on the <cn> element (which is in the MathML namespace) in the CellML namespace. The declaration of the CellML namespace as both the default namespace and as a namespace mapped to the cellml prefix is recommended practice for any <model> element.
The third and final XML namespace defined in the model element is the CellML Metadata namespace, and it is mapped to the cmeta prefix. This allows the id attribute to be placed in the CellML Metadata namespace.
The CellML specification defines a standard dictionary of units that can be used in CellML models without further definition. This dictionary consists of the base SI units (as defined by the Bureau International des Poids et Mesures) and some additional units commonly found in the types of biological models likely to be defined using CellML.
A modeller who wishes to use units not declared in the standard dictionary can define additional units in the CellML model document. The units definitions from the EGF pathway model are shown in Figure 4. These units definitions occur immediately within the root <model> element, giving them document-wide scope.
The units definitions are declared with a set of <units> elements. Each <units> element must have a name attribute, which declares an identifier that can then be referenced in the units attribute on <variable> and <cn> elements (<cn> elements enclose bare numbers in MathML).
Each <units> element may contain one or more <unit> elements. The units defined by the <units> element are a combination of the contents of these <unit> elements. Section 5.2.2 of the CellML specification explains how the attributes for each <unit> element contribute to a new units definition. CellML processing software may choose to use these units definitions to verify the consistency of units (inserting scale-factors, if appropriate) and check the dimensions of equations, as described in Section 5.2.6 and Section 5.2.7 of the CellML specification, respectively.
The CellML definition of the EGF pathway model consists of a total of twenty-five components. Twenty-three of these correspond to chemically distinct objects (fifteen chemical species and eight reactions) and were shown in the diagram in Figure 2. The other two components are defined purely for modelling convenience: one stores environment variables such as time and the other is an abstract encapsulating component that hides the portion of the pathway that results in the EGF-EGFR complex from all other components. Within each component variables are declared for use within the mathematics that determine the model's function. The interface on each variable declaration determines whether the value of the variable is calculated in the current component and exported, or calculated elsewhere and imported.
In the subsequent sections, the components corresponding to the environment, the reactant EGF, the EGF-EGFR binding reaction, and the encapsulating EGF subsystem are discussed in detail. The remainder of the components closely resemble one of the components discussed and are not examined here.
The first <component> element defined in the CellML description of the EGF pathway model has an identifier of environment. The complete definition is given in Figure 5. As the name suggests, this component does not correspond to a physical compartment in a cell, but rather is an abstract container used to define variables whose reach or scope could be thought of as global.
environment component represents a convenient abstract container, which is used to define the independent variable time separately from the rest of the model.
In the EGF pathway model, the only global variable is the independent variable time. In CellML models no variables are given any kind of precedence or implicitly assumed to exist. This makes the model definition more robust and model components more re-usable. It is therefore necessary to declare variables to represent time and space if they are needed in a model, but it is not necessary to indicate that a variable like time is an independent variable, as this can be determined from analysis of the differential equations. One could theoretically call the variable that represents time A1 and use the names t and time to represent other concepts in your model. It would be unwise to use the names time and t to represent anything other than time. Doing so would make it more difficult for other model authors to re-use your work.
Note that the name environment has no special significance, but is simply a human-readable identifier enabling the modeller to determine the purpose of this component. We expect that software would make use of ontology or metadata information associated with the component to work out that this component would not be rendered in the same way as the membrane component, for example.
The EGF component is a container for the chemical species EGF, or epidermal growth factor. The full definition of the EGF component is given in Figure 6.
EGF component from the CellML description of the EGF pathway model. The metadata has been omitted (metadata is discussed in the section “Model metadata”).
The cmeta:id attribute uniquely identifies the EGF component and can be used to associate metadata with this component. Metadata is discussed in more detail in the section “Model metadata”.
Each component in the EGF pathway model that represents a species taking part in a reaction (excluding the catalyst MAPK component) declares at least three variables for use internally. The first is the concentration of the chemical species that the component represents. This variable has been given the same name as the parent component (each component in the model and each variable in a component must be named uniquely, but there is no restriction on a variable having the same name as a component). This variable is declared with a public_interface attribute value of "out", so its value can be modified in the current component and is available to other components that are connected to this one.
The second variable in the EGF component is delta_EGF, which represents the rate of change in the concentration of species EGF as a result of the reaction. It has a public_interface attribute value of "in". Its value is imported from the EGF_EGFR_bind_rxn component, in which it is declared with a public_interface value of "out".
Finally, the independent variable time is declared with a public_interface value of "in", allowing it to be imported from the environment component.
The component also defines the equation that governs the conservation of species EGF using MathML elements. The equation is contained in a block with a root element <math>. The <math> element includes a default namespace declaration (the xmlns attribute), which overrides the default CellML namespace declaration on the document's <model> element to place the <math> element and all of its children in the MathML namespace. The equation states that the rate of change of EGF with respect to time is equal to delta_EGF.
The conservation equation is somewhat more complicated in the case of the EGF_EGFR component, which contains the complex of epidermal growth factor and epidermal growth factor receptor. In the EGF_EGFR component, the mathematics from which are given in Figure 7, the rate of change of species EGF_EGFR is equal to the sum of two delta variables, where each delta variable represents the change in concentration of a species due to a single reaction.
The EGF_EGFR_bind_rxn component serves as a container for the reaction in which the epidermal growth factor binds to its receptor. The full definition of the reaction is represented in Figure 8. The purpose of the reaction component is to define the mathematics of the reaction itself. The EGF_EGFR_bind_rxn component calculates delta variables, which represent the rates of change in the concentrations of the participating species due to the reaction process. These delta variables are exposed in such a way that they can be used in conservation equations stored in the species components.
EGF_EGFR_bind_rxn component from the CellML description of the EGF pathway model. The metadata and the mathematics defining the reaction rate have been omitted (metadata is discussed in the section “Model metadata”).
The EGF_EGFR_bind_rxn component defines a number of variables. The values of the first set of variables are imported from other components for use internally: the concentrations of the two reactants (EGF and EGFR) and the product (EGF_EGFR). All of these variables are declared with a public_interface attribute value of "in". The second set of variables declared in the reaction component are the delta variables, delta_EGF, delta_EGFR, and delta_EGF_EGFR. The values of these variables are calculated within the reaction component and exposed to the other components in the model, so these variables are declared with a public_interface attribute value of "out". The third set of variables are purely for use internally, and so have no public_interface attribute. The first two of these, k_forward and k_reverse, are the rate constants used in the calculation of the reaction rate, which is the third variable r. The separate calculation of the reaction rate allows the modeller to make use of the implied mathematics properties of the <role> elements to concisely specify the values of the delta variables.
One of the more specialized elements in the CellML vocabulary is the <reaction> element, which follows the variable declarations inside the EGF_EGFR_bind_rxn component. Reaction elements are used in CellML to provide an easy way of specifying which species (as represented by variables) take part in a reaction and indicating their role in the reaction. Although this information could sometimes be obtained from analysis of a purely mathematical definition of the reaction behaviour, use of the <reaction> element makes it possible for processing software to obtain enough information to produce reaction equations and pathway diagrams for a model.
The <reaction> element contains a reversible attribute to indicate that the reaction defined may proceed in both directions. The reversible attribute in this example is not necessary because if the reversible attribute is not present, the default value is "yes". In this binding reaction, the <reaction> element contains four <variable_ref> elements. The first three each reference a species variable and the fourth references the variable that is used to calculate the reaction rate. A variable is referenced using the variable attribute, the value of which must match the name of a variable declared in the current component. Each <variable_ref> element may contain one or more <role> elements. The role attribute of each <role> element declares the way in which the referenced variable participates in the reaction. In this example, species EGF and EGFR have a role of "reactant", and species EGF_EGFR has a role of "product". Species may also assume roles of "catalyst", "activator", "inhibitor", "modifier", and "rate". A single species (represented by a single variable) may participate within the same reaction in numerous roles, e.g., a reactant may also be an inhibitor. This is specified by creating more than one <role> element within the same <variable_ref> element.
The value of the role attribute specifies the participation of the species when the reaction is proceeding in the direction specified by the value of the direction attribute. The direction attribute may take values of "forward", "backward", and "both". Its value defaults to "forward" if omitted, as it has been in this binding reaction. The fact that the reaction is reversible implies that a species that is a reactant in the forward direction is a product in the backward direction and that a species that is a product in the forward direction is a reactant in the backward direction. No such assumptions are made about species that act in other roles.
The delta_variable attribute on the <role> elements is used to associate a delta variable with the principle variable referenced in the parent <variable_ref> element. The delta variable represents the rate of change in the concentration of the principle variable due to its involvement in the current reaction. A delta_variable attribute may only appear on <role> elements with a role attribute value of "reactant" or "product", as these are the only forms of reaction participation where the concentration of the participating species is changed. In this binding reaction, the variable delta_EGF is declared to be the delta variable associated with variable EGF.
The stoichiometry attribute on the <role> element defines the stoichiometry of the current variable relative to the other reaction participants. In this example the stoichiometry attributes on each of the <role> elements allow us to form the chemical expression for the reaction. Different values of stoichiometry may be defined on different <role> elements within the same variable reference, allowing for instance, a species to participate as a reactant with one stoichiometry and as an inhibitor with another.
When both delta_variable and stoichiometry attributes are defined on the same <role> element, it implicitly defines an equation relating the delta variable to the reaction rate. The reaction rate variable is indicated by a <variable_ref> element containing a <role> element with a role attribute value of "rate"
—
in this example that variable is r. The implied equation equates the delta variable with the product of the stoichiometry and rate, producing the following equation:
If math is to be implicitly defined using the scheme just described, then the <reaction> element must contain a variable reference which assigns a role of "rate" to one of the participants. This variable may participate in the reaction in no other way, and a <role> element with a role attribute value of "rate" must not contain direction, delta_variable, or stoichiometry attributes. It is recommended that the equation or system of equations used to calculate the value of the rate variable be placed inside the <role> element to make it easier for software or readers to find the relevant equation(s). These mathematics, which were omitted from Figure 8, is shown in Figure 9.
<role> element with a role attribute value of "rate", inside a <variable_ref> element that references the variable r.
The possible values of the role attribute are based primarily on the information needed for producing a pathway diagram representation of the model
—
further information must be obtained from the mathematics. If the mathematics are omitted, it is possible to define qualitative models using nothing but <reaction> elements. If a modeller erroneously defines a model in which the information in a <reaction> element contradicts the information in the mathematics, the CellML specification instructs processing software to give precedence to the information in the mathematics when running the model. However, it is left to the application to decide how to handle the rendering of the model. Therefore, modellers should be very careful when writing or editing CellML documents by hand. It is anticipated that CellML authoring software will contain constraints that prevent modellers from incorporating inconsistent information in <math> and <reaction> elements.
To learn more about the reaction element, see Section 7 of the CellML specification.
The EGF_subsystem component, the full definition of which is given in Figure 10, is a logical component only; i.e., it does not represent any actual chemical species or reaction. It is used to encapsulate the portion of the pathway that results in the EGF-EGFR complex. Note that this is a different use of encapsulation than is shown in the simple two reaction model with encapsulation
—
instead of encapsulating intermediate reactions, we are encapsulating a complete subsystem. This configuration hides the components representing the EGF and EGFR species from the rest of the model. This means that these species cannot be used by any other reaction in the model. If we wanted to make it possible to later add reactions controlling the production of EGF, for instance, we would need to break this encapsulation.
EGF_subsystem component from the CellML description of the EGF pathway model. The metadata has been omitted (metadata is discussed in the section “Model metadata”).
Encapsulation is represented in CellML using the <group> element. The group element indicating the encapsulation relationship between the EGF_subsystem component and its subcomponents is shown in Figure 11. The participants in the group are named in <component_ref> elements. The <component_ref> elements nest inside other <component_ref> elements in order to indicate how the components relate to one another. A <relationship_ref> element with a relationship attribute value of "encapsulation" signifies that the components EGF, EGFR, EGF_EGFR, EGF_EGFR_Int, EGF_EGFR_bind_rxn, and EGF_EGFR_internalization are logically encapsulated by the EGF_subsystem component, and, therefore, are hidden from all other components.
The CellML model description uses <connection> elements to pass the values of the model's variables from one component to another. Each <connection> element contains a single <map_components> element, which references the components to be connected, and then one or more <map_variables> elements, which reference the variables from each component to be mapped together. The <connection> element used to pass the values of the EGF and delta_EGF variables between the EGF and EGF_EGFR_bind_rxn components is shown in Figure 12.
<connection> element used to pass the values of the EGF and delta_EGF variables between the EGF and EGF_EGFR_bind_rxn components.
The variable_1 attribute on each <map_variables> element references a variable in the component referenced by the component_1 attribute on the <map_components> element. Similarly, the variable_2 attribute on each <map_variables> element references a variable in the component referenced by the component_2 attribute on the <map_components> element. Each variable's value is passed from the component where it is declared with an interface value of "out" to the component where it is declared with an interface value of "in", independent of whether a variable's name appears in the variable_1 or variable_2 attributes. In the example shown in Figure 12, both of the variables referenced have the same name in both the EGF and EGF_EGFR_bind_rxn components, but this need not necessarily be the case.
The CellML description of the EGF pathway model also contains metadata, defined within the system described in the metadata framework section of the CellML specification, and based on the syntax described in the 18 May 2001 draft of the CellML Metadata specification. This metadata has been included for demonstration purposes only. The metadata from the beginning of the CellML description of the EGF pathway model is shown in Figure 13, with the comments removed for brevity.
CellML metadata is classified and associated with CellML documents and CellML elements using the Resource Description Framework (RDF). The RDF specification defines an XML-based syntax for defining metadata. This syntax is suitable for embedding within CellML documents. Within the RDF framework, the CellML Metadata specification recommends using elements from the Dublin Core for classifying various simple types of metadata as described in the XML Encoding of Simple Dublin Core Metadata specification. Personal information is encoded using the syntax proposed in a W3C note entitled Representing vCard Objects in RDF/XML. Citation information is stored using an XML serialization of the Object Management Group's Bibliographic Query Service specification developed for the CellML Metadata specification. Finally, the CellML Metadata specification proposes new elements for several types of metadata for which no XML-based syntax has already been standardised.
In the example in Figure 13, an <rdf:RDF> element (an <RDF> element in the RDF namespace) encloses the metadata block. The <rdf:RDF> element declares all of the namespaces that will be used within the element, redefining the CellML Metadata namespace even though it is declared on the parent <model> element. This is the recommended best practice as it ensures that the <rdf:RDF> element and its children form a self-contained and hence portable block of XML.
Metadata is associated with a particular object by placing it inside an <rdf:Description> element that defines an about attribute. The first <rdf:Description> element has an about attribute value of "EGFR_example", which references the cmeta:id attribute on the <model> element. Within the <rdf:Description> element, the Dublinc Core <dc:title> element is used to provide a human readable name for the model.
The <cmeta:reference> element references the literature with a unique Medline identifier. Note that the method for specifying citation metadata is under review and will almost certainly change.
The <dc:creator> element describes the model builder. The CellML spec recommends that creator element be used to indicate the person who coded the model into CellML, rather than the person who originally developed the model. The information about the person who originally developed the model is available via the reference metadata.
The creation date is given by the <dc:date> element. The creation date is the date on which the model was coded into CellML, not the date on whcih it was originally created (if it was originally created in a different language) or the date on which it was published. The W3CDTF format indicates that the date is given as year-month-day. The last modified date is also described by a <dc:date> element. To distinguish the two, a dcq:dateType attribute must be indicated.
The publisher of the CellML document is indicated by a <dc:publisher> element. The <cmeta:math_problem_type> element gives the GAMS classification of the math which can be found at gams.nist.gov. GAMS type I1a is "ordinary differential equation, initial value problem".
bi_egf_pathway_1999.xml — the raw XML.
bi_egf_pathway_1999.html — an HTML version for browsing online.
bi_egf_pathway_1999.pdf — a PDF version suitable for printing.
cellml_bi_egf_pathway_1999.tar.gz — a gzipped tarball with the XML and this documentation.
bi_egf_pathway_1999_maths.pdf — a PDF of the equations described in the model generated directly from the CellML description using the MathML Renderer.


PDF
