Personal tools
You are here: Home Members tommy Repository Design
 

Repository Design

Click here to get the file

Size 15.2 kB - File type text/html

File contents

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
	<META HTTP-EQUIV="CONTENT-TYPE" CONTENT="text/html; charset=utf-8">
	<TITLE></TITLE>
	<META NAME="GENERATOR" CONTENT="OpenOffice.org 2.0  (Linux)">
	<META NAME="AUTHOR" CONTENT="Tommy Yu">
	<META NAME="CREATED" CONTENT="20070621;19165800">
	<META NAME="CHANGEDBY" CONTENT="Tommy Yu">
	<META NAME="CHANGED" CONTENT="20070621;19233600">
	<STYLE TYPE="text/css">
	<!--
		@page { size: 21cm 29.7cm; margin: 2cm }
		P { margin-bottom: 0.21cm }
	-->
	</STYLE>
</HEAD>
<BODY LANG="en-NZ" DIR="LTR">
<P STYLE="margin-bottom: 0cm">A Possible Model Repository Design</P>
<UL>
	<LI><P STYLE="margin-bottom: 0cm">RDBMS (Relational Database
	Management System) for metadata management</P>
	<UL>
		<LI><P STYLE="margin-bottom: 0cm">Used to better find relationship
		between models.</P>
		<LI><P STYLE="margin-bottom: 0cm">Has unfettered access to the
		model store (so it can see all models, or at least the models that
		are desired to be shown to public by its creator).</P>
	</UL>
	<LI><P STYLE="margin-bottom: 0cm">SVN (Subversion) for model storage</P>
	<UL>
		<LI><P STYLE="margin-bottom: 0cm">So models are versioned.</P>
	</UL>
	<LI><P STYLE="margin-bottom: 0cm">Abstraction layer that binds the
	above two components together</P>
	<UL>
		<LI><P STYLE="margin-bottom: 0cm">Different front ends could be
		built by calling API provided by this layer</P>
	</UL>
	<LI><P STYLE="margin-bottom: 0cm">Zope/Plone for front end
	presentation</P>
	<UL>
		<LI><P STYLE="margin-bottom: 0cm">Enable website users to use it.</P>
	</UL>
</UL>
<P STYLE="margin-bottom: 0cm"><BR>
</P>
<P STYLE="margin-bottom: 0cm">Most action will start from the
abstraction layer, where an interface to both the database and the
model storage interface (to Subversion, or even base on top of a file
system if one does not care about revision history) is built. A RDBMS
schema based on the (hopefully finalized on standardized technology
with a proper RDF schema) CellML metdata specification will be
created at the same time. Metadata would be extracted from submitted
models and inserted into RDBMS. Changes to model can be done via
upload or subversion check-ins. Metadata could be imported/exported
both ways since they can be updated from either places, although one
of the location has to be the authoritative (I vote for the RDF graph
stored with the model in Subversion).</P>
<P STYLE="margin-bottom: 0cm"><BR>
</P>
<P STYLE="margin-bottom: 0cm">For naming convention, there shouldn't
be too much impact. One possible method is to have give each user
their own working directory, let them arrange the directory structure
to however they like (according to some basic guidelines). It should
not affect how the models get presented by the abstraction layer
where the rules to present the models are placed. Actually, examples.</P>
<P STYLE="margin-bottom: 0cm; border-top: none; border-bottom: 1.00pt solid #000000; border-left: none; border-right: none; padding-top: 0cm; padding-bottom: 0.07cm; padding-left: 0cm; padding-right: 0cm">
<BR>
</P>
<P STYLE="margin-bottom: 0cm"><BR>
</P>
<P STYLE="margin-bottom: 0cm">Usage Example 1:</P>
<P STYLE="margin-bottom: 0cm"><BR>
</P>
<P STYLE="margin-bottom: 0cm">User John has a working directory in
Subversion svn://john/. He creates a subdirectory named 'model_1'
(svn://john/model_1). He builds a model in CellML 1.1 and the main
file main.cellml imports stimulus1.cellml and stimulus2.cellml. This
is the file and directory structure so far:</P>
<P STYLE="margin-bottom: 0cm"><BR>
</P>
<P STYLE="margin-bottom: 0cm">svn://john</P>
<P STYLE="margin-bottom: 0cm">svn://john/model_1</P>
<P STYLE="margin-bottom: 0cm">svn://john/model_1/main.cellml</P>
<P STYLE="margin-bottom: 0cm">svn://john/model_1/stimulus1.cellml</P>
<P STYLE="margin-bottom: 0cm">svn://john/model_1/stimulus2.cellml</P>
<P STYLE="margin-bottom: 0cm"><BR>
</P>
<P STYLE="margin-bottom: 0cm">So far, this is considered a private
model because it only resides in Subversion. Now this model happens
to based on the paper that is authored by Sally Jane and Jun Tanaka.
Proper citation metadata was added to the model (via the abstraction
layer or the front end, or even manual) and the generated RDF graphs
were added into the files. Keywords were also given to the models
which are 'cardiac' and 'physiological' and they were also added into
the RDF graph residing in those files. An extra keyword called
'stimulus' was added to the stimulus files.</P>
<P STYLE="margin-bottom: 0cm"><BR>
</P>
<P STYLE="margin-bottom: 0cm">At this point, John could open up the
model to the rest of the world via web front end (which talks the
abstraction layer, which marks the appropriate fields in the RDBMS to
indicate so, or the front end manages that. Implementation of this to
be discussed later). This URIs according to the naming convention
could be generated.</P>
<P STYLE="margin-bottom: 0cm"><BR>
</P>
<P STYLE="margin-bottom: 0cm">http://cellml.example.org/models/citation/jane_tanaka/</P>
<P STYLE="margin-bottom: 0cm"><BR>
</P>
<P STYLE="margin-bottom: 0cm">(The /citation/ can be named something
else, and I omitted the year/month for simplicity in examples)</P>
<P STYLE="margin-bottom: 0cm"><BR>
</P>
<P STYLE="margin-bottom: 0cm">As all the models are marked with the
proper metadata, they all can be represented as a file with the above
HTTP URI.</P>
<P STYLE="margin-bottom: 0cm"><BR>
</P>
<P STYLE="margin-bottom: 0cm">The citation index page would be quite
simple at the moment, showing a brief listing of the metadata that is
added into the models themselves, with links to the files. This is
the basic page.</P>
<P STYLE="margin-bottom: 0cm"><BR>
</P>
<P STYLE="margin-bottom: 0cm">Usage Example 2:</P>
<P STYLE="margin-bottom: 0cm"><BR>
</P>
<P STYLE="margin-bottom: 0cm">It is possible for John to add write
documentation in an HTML file complete with images, referenced by URI
that could be stored as a value with dc:identifier predicate in the
RDF graph of the model(s) (i.e. a pointer to the documentation which
humans and machines can read; it could be achieved via
dc:identifier). If the path to the HTML file is relative the
directory the model resides in Subversion is assumed (i.e.
svn://john/model_1/doc.html will be retrieved and be accessed at
http://cellml.example.org/models/citation/jane_tanaka/doc.html
assuming permissions are given). Session files and diagrams could
also be added by the front end in a similar fashion. The name
assumption is done once and recorded into database, to avoid file
name collisions.</P>
<P STYLE="margin-bottom: 0cm"><BR>
</P>
<P STYLE="margin-bottom: 0cm">At this point, the URIs presented so
far are:</P>
<P STYLE="margin-bottom: 0cm">http://cellml.example.org/models/citation/jane_tanaka/</P>
<P STYLE="margin-bottom: 0cm">http://cellml.example.org/models/citation/jane_tanaka/main.cellml</P>
<P STYLE="margin-bottom: 0cm">http://cellml.example.org/models/citation/jane_tanaka/stimulus1.cellml</P>
<P STYLE="margin-bottom: 0cm">http://cellml.example.org/models/citation/jane_tanaka/stimulus2.cellml</P>
<P STYLE="margin-bottom: 0cm">http://cellml.example.org/models/citation/jane_tanaka/doc.html</P>
<P STYLE="margin-bottom: 0cm">http://cellml.example.org/models/citation/jane_tanaka/diagram.png</P>
<P STYLE="margin-bottom: 0cm">http://cellml.example.org/models/citation/jane_tanaka/main.cellml.pcenv</P>
<P STYLE="margin-bottom: 0cm"><BR>
</P>
<P STYLE="margin-bottom: 0cm">Usage Example 3:</P>
<P STYLE="margin-bottom: 0cm"><BR>
</P>
<P STYLE="margin-bottom: 0cm">Mary is another user of the system, and
her workspace in SVN is 'svn://mary'. She created a  directory
'a_model', and was working on a model based on the same Jane-Tanaka
paper that John was working with. She created a CellML 1.0 model and
simply named it 'model.cellml', then she published it as a public
model. Now the index page at
http://cellml.example.org/models/citation/jane_tanaka/ will also show
model.cellml as another file. She could also have a documentation
file pointed by the RDF metadata where website users can open via a
link.</P>
<P STYLE="margin-bottom: 0cm"><BR>
</P>
<P STYLE="margin-bottom: 0cm">Usage Example 4:</P>
<P STYLE="margin-bottom: 0cm"><BR>
</P>
<P STYLE="margin-bottom: 0cm">Mary decided to create another model
based on the same paper, and she named it 'main.cellml'. She then
tries to publish the model but hits a snag – there is already a
model with that filename and the abstraction layer detects that. In
order for her to publish that model, she either have to rename the
filename, or treat her model as a branch (or fork, or variant) of the
model that is named 'main.cellml'. Renaming is probably a more simple
approach in this case since she only has one file and it's doubtful
someone is using it.</P>
<P STYLE="margin-bottom: 0cm"><BR>
</P>
<P STYLE="margin-bottom: 0cm">Usage Example 5:</P>
<P STYLE="margin-bottom: 0cm"><BR>
</P>
<P STYLE="margin-bottom: 0cm">John works on his version of
main.cellml again, but he needs to make drastic changes to the model
and so he creates a branch in Subversion in his working directory (in
svn://john/model_1/branch). He also thought that reviewers should
review the model before merging his changes back into the original
file. So he exposes the new model as a branch also through the
website/abstraction layer by naming it 'john_branch'. The branched
'model.cellml' would then be accessible via
http://cellml.example.org/models/citation/jane_tanaka/john_branch/main.cellml
and only by model reviewers. As for the stimulus files that
main.cellml imports, John could either copy them into the branch, or
update the references in main.cellml to use the stimulus files that
resides in the parent directory. This may or may not work as
intended.</P>
<P STYLE="margin-bottom: 0cm"><BR>
</P>
<P STYLE="margin-bottom: 0cm">Usage Example 6:</P>
<P STYLE="margin-bottom: 0cm"><BR>
</P>
<P STYLE="margin-bottom: 0cm">Website user Ming (who also writes
CellML models) decided to browse models by keyword. He decided to
view http://cellml.example.org/models/keyword/stimulus/ and saw the
file stimulus1.cellml. He decided that file suits his needs and his
CellML 1.1 model can import 
http://cellml.example.org/models/keyword/stimulus/stimulus1.cellml. A
problem, however, is that Mary decided to convert one of her models
from the same paper (jane_tanaka) and names one of her stimulus files
as stimulus1.cellml and was given the stimulus keyword! Now the URI
http://cellml.example.org/models/keyword/stimulus/stimulus1.cellml
could point to two different files, and a way to distinguish between
them is to assign a sequence of numbers to the files, and so the two
files will have unique URIs such as:</P>
<P STYLE="margin-bottom: 0cm"><BR>
</P>
<P STYLE="margin-bottom: 0cm">http://cellml.example.org/models/keyword/stimulus/2/stimulus1.cellml</P>
<P STYLE="margin-bottom: 0cm">http://cellml.example.org/models/keyword/stimulus/9/stimulus1.cellml</P>
<P STYLE="margin-bottom: 0cm"><BR>
</P>
<P STYLE="margin-bottom: 0cm">Which the keyword model index page
should probably be linking to. However (this is up to debate) if Ming
did make the mistake of using the original URI, the model with id #2
would be retrieved instead.</P>
<P STYLE="margin-bottom: 0cm"><BR>
</P>
<P STYLE="margin-bottom: 0cm">Also, using the URI based on the
internal identifier of the CellML file could have added benefits. The
URI http://cellml.example.org/models/keyword/stimulus/2/ could show
the info page about the model, and a link to the actual CellML file
can be shown there also.</P>
<P STYLE="margin-bottom: 0cm"><BR>
</P>
<P STYLE="margin-bottom: 0cm">Ming could also browse the models by
its id, such that the URI</P>
<P STYLE="margin-bottom: 0cm">http://cellml.example.org/models/id/2/
will also show the informational page on stimulus1.cellml that John
created.</P>
<P STYLE="margin-bottom: 0cm"><BR>
</P>
<P STYLE="margin-bottom: 0cm">Searching model files will return the
id based URI, and if citation is desired the citation root URI can be
returned.</P>
<P STYLE="margin-bottom: 0cm"><BR>
</P>
<P STYLE="margin-bottom: 0cm">Usage Example 7:</P>
<P STYLE="margin-bottom: 0cm"><BR>
</P>
<P STYLE="margin-bottom: 0cm">While this has not been defined yet,
Ming should be able to access previous versions of the models via an
URI. This URI
http://cellml.example.org/models/id/2/stimulus1.xml?rev=3 could be a
possible format candidate.</P>
<P STYLE="margin-bottom: 0cm; border-top: none; border-bottom: 1.00pt solid #000000; border-left: none; border-right: none; padding-top: 0cm; padding-bottom: 0.07cm; padding-left: 0cm; padding-right: 0cm">
<BR>
</P>
<P STYLE="margin-bottom: 0cm"><BR>
</P>
<P STYLE="margin-bottom: 0cm">Arguments on RDBMS</P>
<P STYLE="margin-bottom: 0cm">Pro:</P>
<UL>
	<LI><P STYLE="margin-bottom: 0cm">Relational databases has been
	established for a long time</P>
	<LI><P STYLE="margin-bottom: 0cm">It can be used to show
	relationship between models much easier than an object database like
	Zope DB.</P>
	<LI><P STYLE="margin-bottom: 0cm">Can be quite straightforward, lot
	easier to write queries</P>
</UL>
<P STYLE="margin-bottom: 0cm">Con:</P>
<UL>
	<LI><P STYLE="margin-bottom: 0cm">SQL looks ugly.</P>
	<LI><P STYLE="margin-bottom: 0cm">It is separate from the model
	storage, could cause inconsistency between metadata residing in
	model.</P>
	<LI><P STYLE="margin-bottom: 0cm">Data not necessarily versioned</P>
	<UL>
		<LI><P STYLE="margin-bottom: 0cm">Counterpoint: citations should
		have been immutable anyway. Spelling mistakes in author's name
		should not be versioned anyway and really should be corrected asap.</P>
		<LI><P STYLE="margin-bottom: 0cm">It could be think of the metadata
		stored in the RDBMS is a snapshot.</P>
		<LI><P STYLE="margin-bottom: 0cm">It has the advantage of
		correcting spelling mistakes, but this will mean all the models in
		the repository will have to be synchronized with the correct
		spelling and that can have an adverse affect on performance.</P>
	</UL>
</UL>
<P STYLE="margin-bottom: 0cm"><BR>
</P>
<P STYLE="margin-bottom: 0cm">Arguments on SVN</P>
<P STYLE="margin-bottom: 0cm">Pro:</P>
<UL>
	<LI><P STYLE="margin-bottom: 0cm">Revision/version capabilities
	built on established foundation.</P>
	<LI><P STYLE="margin-bottom: 0cm">If website dies data can still be
	accessed in theory.</P>
</UL>
<P STYLE="margin-bottom: 0cm">Con:</P>
<UL>
	<LI><P STYLE="margin-bottom: 0cm">Does not address the specific
	needs of CellML.</P>
	<UL>
		<LI><P STYLE="margin-bottom: 0cm">Any given CellML model with more
		than one component can have more than one serialization, rendering
		svn diff useless (no way to easily find difference between
		revisions).</P>
	</UL>
</UL>
<P STYLE="margin-bottom: 0cm"><BR>
</P>
<P STYLE="margin-bottom: 0cm">Arguments on the abstraction layer</P>
<P STYLE="margin-bottom: 0cm">Pro:</P>
<UL>
	<LI><P STYLE="margin-bottom: 0cm">It makes writing front ends much
	easier, gives flexibility</P>
</UL>
<P STYLE="margin-bottom: 0cm">Con:</P>
<UL>
	<LI><P STYLE="margin-bottom: 0cm">Could be complicated by having to
	tie SVN together with a RDBMS.</P>
</UL>
<P STYLE="margin-bottom: 0cm"><BR>
</P>
<P STYLE="margin-bottom: 0cm">Arguments on Zope/Plone</P>
<UL>
	<LI><P STYLE="margin-bottom: 0cm">Not very relevant I believe, as
	it's just a front end to the abstraction layer. It conld conceivably
	be written in CGI, but I doubt that is desirable.</P>
</UL>
<P STYLE="margin-bottom: 0cm"><BR>
</P>
<P STYLE="margin-bottom: 0cm"><BR>
</P>
</BODY>
</HTML>