LMNL data model
From LMNLWiki
This page describes the abstract LMNL data model. It's subject to revision (but we need to get it finalised). If you want a concrete model with mutators and all, see LOM instead.
A LMNL data model is a collection of model objects of the following six types:
Documents
A Document has two properties:
- base uri - the URI of the Document (may be null if the URI is not known)
- value - the Limen containing the value of this Document
Two Documents are equal iff their properties are equal.
Names
A Name has two properties:
- namespace name - a string with the syntax of an IRI
- local part - a string with the syntax of an XML 1.1 NCName
Two Names are equal iff their properties are equal.
Limina
A Limen (pl. Limina) has three properties:
- owner - the Annotation, Document or Limen that owns this Limen
- content - a sequence of zero-or-more Atoms (or Ranges in non-flat LMNL)
- ranges - a set of zero-or-more Ranges over the content
The ranges property is unordered.
Two Limina are equal iff they are the same Limen.
In a data model derived directly from a document in LMNL syntax, there is one Limen for the Document and one for each Annotation.
Additional Limina may be created; if Limen A is owned by Limen B, then the content property of Limen A is derived from the ranges property of Limen B by applying a selection function that picks out a subset of the ranges and an ordering function that gives them an order. These functions are specific to Limen A and need not be the same as those used for any other Limen.
Ranges
A Range has five properties:
- owner - the Limen that contains this Range
- name - the Name of this Range (may be null)
- start - an integer specifying the start point of this Range in the content of the owner
- end - an integer specifying the end point of this Range in the content of the owner
- annotations - a sequence of zero-or-more Annotations of this Range
The start and end of a range are constrained to be between zero (indicating the point before the first Atom in its owner's content) and the length of the owner's content (indicating the point after the last Atom in that content). The end of a Range must be greater than or equal to the start of that Range.
The annotations of a range appear in the same order in which they appear in the document, without distinction between annotations in the start-tag and annotations in the end-tag.
Two Ranges are equal iff all their properties are equal.
The length of a Range is its end minus its start. The value of a Range is the sequence of Atoms in the content of the owner of the Range falling between the start and end points. These are derived properties.
See also Range relationships.
Annotations
An Annotation has four properties:
- owner - the Range, Annotation, or Atom that this Annotation is annotating
- name - the Name of this Annotation (may be null)
- value - the Limen containing the value of this Annotation
- annotations - a sequence of zero-or-more Annotations of this Annotation
Two Annotations are equal iff they are the same Annotation.
Atoms
An Atom has three properties:
- owner - the Limen containing this Atom in its content
- name - the Name of this Atom (may be null)
- annotations - a sequence of zero-or-more Annotations of this Atom
If an Atom has the name
lmnl:char
and contains a single
Annotation named codepoint
, then
the value of that annotation is interpreted as
an integer expressed in hexadecimal digits, and
the Atom represents a Unicode character with
that codepoint. Note that these digits are
themselves Atoms, and so this model contains an
infinite regress. The practical
LOM API avoids this
problem.
All other Atoms have application-defined meaning.
Two Atoms are equal iff their properties are equal.