Introduction
Standards foster enormously the exchange of contents. If they do not
exist the development of the industry can be more difficult because they
can be afraid of investing without agreed ways of exchanging content.
Very well-know examples of standards are those of TV with names such as
PAL, SECAM, NTSC. It is important to be aware of the standards, and use
them wisely.
Standards can come from systematic efforts approved by official
standardisation bodies devoted to the issue; we say that they are official standards. In some cases
we label them as de facto standards,
when there is a wide acceptance by the industry and/or the public
Sometimes the dominance of a standard results from costly industrial wars. An example is the VHS industrial standard for consumer video players-recorders. It was the one who won a battle over some other ones. Standards can coexist, as in the case of TV, where there are several incompatible ones, as is well known.
The standards situation for multimedia is quite complex. On one hand
it is a fast moving area, where some standards have been promoted but
not accepted. Some standards are in preliminary phases, and after heavy
investment, the might not end up with full acceptance. On the other
hand, because multimedia involves a lot of fields, standards for the
different fields might be relevant. For example, the following
incomplete list of multimedia standards, can give an idea of how many
standards might be interesting for multimedia:
Let us remark that there are semi-official standards between
official and de facto ones: this is the case of Internet and W3C
"standards". As the official ones follow a very lengthy process, some
semi-official bodies promote standards in between those. Official
standards follow a lengthy process, and thus usually standards are
associated to a status which
indicates the current situation in the process. Fon instance, phases of
this process are: calls for contributions, committee draft (CD), draft
of international standard (DIS), international standard (IS), etc.
Internet (IETF) and W3C "standards" also follow their own process, with
other terminology.
The goal for this part is to understand some of the most relevant or interesting standards and the most relevant aspects of the whole issue.
Some image and video standards
Unprocessed image and video in digital form occupy a huge amount of
space, at least when compared to text. A character is usually one byte
and a page will be several hundreds of them. A colour image of the size
of a small standard VGA screen occupies, in principle, 640x480x3 bytes,
i.e., the number of pixels multiplied by the number of bytes per pixel
(one for each of the R, G, and B channels). It is extremely important to
reduce this size for storage and transmission purposes. This is achieved
through coding the image in a compressed way at one end, and decoding it
to an uncompressed form at the other side. For this process to be
useful it is important its standardisation. Otherwise, the exchange
would be only possible to the same systems.
Thus, image and video compression standards quickly appeared for transmission. For example, H.261 and H.263 are image standards for videotelephony. Both of them work at 64 Mbits/sec (the speed of transmission is a very important parameter), while H.261 can also work at 2 Mbauds, offering the possibility of better quality videotelephony. The H.*** have associated with them some G.*** standards for the audio component of the video.
JPEG
JPEG stands for Joint
Photographic Experts Group, the original name of the committee that
wrote the standard. Now the body is officially called ITU – T
JTC1/SC2/WG10.
The first generation of ths standard intended to provide compression for continuous-tone still images of natural scenes, which could be full-colour (24 bit) or gray-scale. This is the first JPEG, dated 1986. JPEG is a lossy compression scheme, meaning that the decompressed image is usually worse than the original: some losses exist in the proces. Nevertheless, JPEG exploits known characteristics of the human eye to achieve a very high degree of compression, without much apparent visual degradation. The compression parameters can be adjusted by the user, trading off file size against output image quality. The basic ideas behind the compression scheme are explained later.
A new version called JPEG2000 has recently become an official standard. It is based on wavelet transforms, and the applications are more ambitious than for the original JPEG.
MPEG 1, 2, 3
MPEG stands for Moving Pictures Expert Group, which is the popular name of the ISO/IEC committee working on digital colour video and audio compression, ITU – T JTC1/SC2/WG11, established in 1988. According to MPEG home page, the group is "in charge of the development of standards for coded representation of digital audio and video". MPEG has produced:
Apart from other developments, work on the new standard MPEG-21, the multimedia framework, has started in June 2000.
MPEG-1
The specific target of MPEG-1 is CD-ROM and DAT platforms for multimedia applications with a bandwidth of 1.5 Mbit/sec. The three parts of the standard are video encoding, with a binary stream taking almost the 1.15 Mbit/s bandwidth, audio encoding, and systems (which includes information for synchronisation, and information about the coding). The target quality of MPEG-1 is that of a VHS video.
Although MPEG-1 can cater for other possibilities, the basic parameters for video use a non-interlaced format, 352 x 240 pixels x 30 frames/s in the US; 352 x 288 x 25 frames/s in Europe (CIF). The RGB pixel information is converted into Y (luminance), and UV (chrominance); the chrominance is subsampled. The compression scheme is based on the Discrete Cosine Transform (DCT), which is a variant of the Fourier Transform, performed on 8x8 blocks of the image. Quantization and Huffman coding are also used following this step. The compression also uses the redundancy in time by means of motion vectors. The coding scheme distinguishes three types of frames: I (intra) frames, which are coded as still images; P (predicted) frames, which are deltas from the most recent past I or P frame; and B (bidirectional) frames, which are interpolations between I and P frames. I frames are sent once every 10 or 12 frames. The scheme is asymmetric: a lot of computing power is required to encode, while decoding is less demanding. The 1986 version of JPEG is also DCT based, very similar to the coding we have described for MPEG-1.
MPEG-2 (and 3)
MPEG-2 is designed to offer
higher quality than MPEG-1, at a higher bandwidth (between 4 and 10
Mbit/s). The scheme is very similar to MPEG-1, and scalable.
MPEG-3 was targetted for HDTV,
but was never developed, as it was realised that one of the versions of
MPEG-2 (the level High 1440) could be used for High Definition
Television. But as this was realised later, the next forthcoming
standard at the time was named MPEG-4.
By the way, it is worth remarking that the very popular MP3 is the
audio part of the MPEG-2 standard.
Some (early) multimedia standards
While the main problem of audio and video is compression, multimedia
standards have to cater for another type of problems. In this section we
review two of the standards, in order to have a better grasp of these
problems.
MHEG: an OO path
MHEG stands for Multimedia and Hypermedia Information Coding Experts Group, officially called ITU – T JTC1/SC2/WG12. The goal of the standard was to develop a Coded Representation of Multimedia and Hypermedia Information.
The standard specifies a coded representation of final-form multimedia/hypermedia information objects, for their interchange as units within or accross systems, by any means of interchange, from storage devices to telecommunication and broadcast networks. These objects define the structure of multimedia/hypermedia presentation in a system-independent way. These objects, called MHEG objects, provide functionality for: final-form representation, support for systems with miminal resources, interactivity and multimedia synchronisation, real-time presentation and interchange.
MHEG applications, which are sets of MHEG objects, consist mainly of declarative code, but provisions exist for calling external/procedural code. MHEG applications need only be authored once and then run on any platform that is compliant to MHEG. The so-called MHEG Interpreter or Engine is in charge of performing interpretation and presentation of MHEG applications, as well as handling user interaction, on heterogeneous platforms (PC, digital television terminals...).
The objectives of the MHEG standard are situated at different levels:
- Interchange: MHEG provides interchange facilities for various media types. The media data may be encoded by using proprietary techniques. In order to support multimedia/hypermedia interchange, MHEG provides means of grouping multiple media types into a single interchange unit.
- Presentation: MHEG supports final-form presentation of multiple media types (e.g. audio, video, text...). The media type is identified to enable the use of appropriate resources and services for its presentation. In order to support multimedia/hypermedia presentation, MHEG provides means of grouping multiple media types into a single presentation. Such groups enable media synchronisation in time and space, interaction with media, modification of the presentation, e.g. volume, size, position...
Figure 1. MHEG used as interchange unit and as
identification of multimedia information.
- Minimal Resources: MHEG is designed to be supported by systems with minimal resources.
- Real-Time: MHEG is intended to ease the real-time interchange and presentation of multimedia information.
Figure
2. Interchange of MHEG objects.
MHEG is composed of the following parts: 1: Object Representation, Base Notation (ASN.1), 1994; 2: Object Representation, Alternate Notation (SGML); 3: Script Interchange Representation, 1995; 4: Registration Procedures, 1995; 5: Support for Base-Level Interactive Applications; 6: Support for Enhanced Interactive Applications, 1998; 7: Interoperability and Conformance Testing.
MHEG is strongly object-oriented. MHEG inheritance tree is
LINK
SCRIPT
COMPOSITE
CONTAINER
MHEG addressed only the structuring, coding and interchange of objects. It was coordinated with another emerging standard, PREMO, Presentation Standard for Multimedia Objects, for this aspect. Both of them have disappeared from the stage, and it is even quite difficult to trace them now.
MPEG-4: a multimedia standard coming from the audiovisual side
Currently, MPEG-4 is officially described as a standard for multimedia applications. But the initial goal was different: MPEG-4 was intended for very low bit rate coding of video. In the process of the discussion of the standard it became oriented towards a new form of representation of audiovisual content, better suited for multimedia applications. MPEG-4 should be able to support audio-visual interactive services, such as videogames, teleshopping, ...
The MPEG-4 representation of video represents a departure from the
traditional digital signal processing approach. This approach can be
labelled as coding oriented towards compression based on Fourier
transforms or similar mathematical operations. The new approach is a
step towards using more "intelligent" image analysis and understanding
in the whole process.
MPEG-4 introduced the concept of a scene composed of different audiovisual objects (AVOs). It is implicit that a segmentation process of the images exists, which produces these objects. In the MPEG-4 standard, these objects can be compressed in a traditional way, using, for instance, MPEG-1 or 2. But the novelty is the previous process, usually associated to the field of Computer Vision, and not to Digital Signal Processing). AVOs can be VOCs (video object components) or AOCs (audio object components) or both.
SNHC (Synthetic – Natural Hybrid Coding) is another important and derived concept in MPEG-4. As we have AVOs which are a representation of "natural" video, these "natural" images could eventually be be integrated with "synthetic" images, such as those coming from computer animation, CAD packages, etc. This is the goal of SNHC. This representation is also better suited to interactive multimedia applications. One can imagine an example of a teleshopping application, in the form of an "interactive" video advertising a car. We could produce this in MPEG-4 using SNHC: the moving car could be a moving clickable AVO, linked to a web page giving more information, and allowing to order it (and even pay by credit card).
The architecture of the coding part starts by creating an object definition from the input information. Then the coding proceeds for the different objects, in parallel, for instance. The information is then multiplexed to produce the output bitstream. The architecture of the decoding part starts with an input bitstream, which is demultiplexed into objects, which are decoded (decompressed) and finally composed into the scene, which can be displayed, and allow for user interaction.
Another important concept in the video part is video object planes (VOPs), which can be better understood if you know a little bit of synthetic 3D images. In terms of synthetic images, the scenes can be described by giving the geometry of the objects, the position (movement) of these objects, and the lighting. If we position a virtual camera or point of view, a synthetic video will show a "rendered" scene, with, amongst other things, objects occluding some other ones according to the relative depth (distance) they are with respect to the camera (sometimes objects disappearing behind others and then reappearing). Computer animation software uses a lot the depth information to save computation time (if an object disappears it does not need to enter into the lengthy and expensive rendering computations). Depth information is stored as another byte, and it is usually called the alpha channel. VOPs are somehow based on grouping objects having common alpha planes; the VOCs could be either "natural" or "synthetic"; in the final step of decoding, scene composition is somehow based on the VOCs plus the information of the alpha channel. Thus it is not a full 3D representation but a 2D+depth (alpha). A "natural" video could be coded with just one VOP, which is, for instance, MPEG-2 compressed, without any multimedia. Following the previous example of the interactive ad of a car, this could mean that we have 3 VOPs, one of the "natural" video, another one of an invisible mask which follows the movement of the car and is clickable, allowing a third VOP to appear with the information about the car, ... MPEG-4 caters for these possibilities and also for applications which demand huge computational resources and are not real time.
Another important concept introduced in the early stages of MPEG-4
is MSDL, MPEG-4 Syntactic Description Language.
The MPEG-4 representation of audiovisual content we have described
above in a very simplified way requires a powerful language for the
description of the scenes, the algorithms for coding-decoding objects,
for compositing the scene, for interrelations, for interaction. This
language was introduced intended to become a key part of the standard
with the name of MSDL. The language included shape, texture, motion,
...Finally, MSDL did not become part of the standard, and was replaced
by the Binary Format for Scene description (BIFS), which has a more
limited scope. Recently, XMT has been intrdoduced, and could become a
MSDL replacement.
A final remark is the interesting evolution of MPEG-4, which started as an audiovisual standard to become focused as a multimedia standard, due to the need to answer pressing existent problems.
SGML: a path from electronic text
SGML stands for Standard
Generalized Markup Language; another formal way of naming it is as the
standard ISO 8879:1986. In principle, the goal of the standard was to
cater for the definition of device-independent, system-independent
methods of representing texts in electronic form; the best approach
which was accepted was the use marked-up text. SGML was not intended for
multimedia representation. Nevertheless,some characteristics (probably
the fact that it is a metalanguage, i.e. a language for formally
describing a language, in this case, a markup language) have made it
become generally used for multimedia representation in some way.
First we start by clarifying what a markup language is and then we go on to SGML.
The well-known HTML (Hypertext Markup Language) is an example of a markup language, which is in fact a by-product of SGML. In the markup language we need to be able to differentiate what is text and what is markup (also know as tags), its meaning, what is required and what is allowed. A HTML example will help us recalling some aspects:
<TITLE>Example</ TITLE >
<BODY BGCOLOR="#FFFFFF">
<HEAD>HTML example</HEAD>
This is a <b>HTML</b> page.<br>
We can provide a link to <a href: "http://www.upf.es">UPF</a>
And we could also insert an image <img src="image1.gif">
</BODY>
</HTML>
The power of SGML is that it is a metalanguage, allowing the definition of markup languages. The key concept that allows this is the DTD or Document Type Definition. A DTD allows to define a class of documents, indicating the type of elements one can find in a document, giving them a specific name. These elements can have attributes (also named). With the DTD a rich semantic structure amongst the elements can be establised. After a DTD has been defined, a document is an instance of this DTD, and, with the help of the DTD, can be understood. HTML is a byproduct of SGML in the sense that it is a class of documents; HTML pages are instances of this type of documents. Let us remark that, while SGML is an official standard, HTML is not, in the sense that this document type has not followed a process of standardisation, although there is a very wide consensus about the different versions.
The SGML description which follows is derived from Balpe et al, and A gentle introduction to SGML. As HTML can be a quite complex example for a document, we start from an intended simpler one; a poem anthology (taken from A gentle ...):
<line>The invisible worm,</line>
<line>That flies in the night</line>
<line>In the howling storm:</line>
<stanza>
<line>Of crimson joy:</line>
<line>And his dark secret love</line>
<line>Does thy life destroy.</line>
<!ELEMENT anthology - - (poem+)>
<!ELEMENT poem - - (title?, stanza+)>
<!ELEMENT title - O (#PCDATA) >
<!ELEMENT stanza - O (line+) >
<!ELEMENT line O O (#PCDATA) >
]>
2. A poem always has a single title element which precedes the first stanza and contains no other elements.
3. Apart from the title, a poem consists only of stanzas.
4. Stanzas consist only of lines and every line is contained by a stanza.
5. Nothing can follow a stanza except another stanza or the end of a poem.
6. Nothing can follow a line except another line or the start of a new stanza.
In the third line, we see that the element poem is made of two elements, one or more stanza (compulsory) but that title is optional, indicated by ?. For elements optional and repeatable, an asterisk * is used.
In the fifth line, we indicate that title is a "final" element, taking values in an alphanumeric field (it will be actual "text", not "tag"); the same thing happens in the last line of the DTD with line.
The repeated signs !, indicate that the elements are chained by an exclusive or. The alternative signs could be the comma , to indicate an ordered sequence of elements; or the sign & indicating an and.
A finer detail is in the signs - -, - O, O O. Because of the structure defined by the document, the end of the line does not need to be indicated: the line is followed by another one (unless the stanza ends). The end of line tag can be thus omitted; - - means that tags are always compulsory, while for complete optionality O O is used.
The first line indicates that we are definining a document type called Anthology.
These aspects are the basic ones; structures (content models) can be more complex, and this is allowed in SGML through concurrent structures, or exceptions (variants). Also, whole instantiations are called entities, and can be named and recalled, as an indirect way of addressing.
DC, RDF: Identifying and describing resources
First of all, let us remark that this part is very relevant for the Semantic Web. DC stands for Dublin Core, a proposed standard for metadata. RDF stands for Resource Definition Framework. In this part we discuss some of the issues related to resources and metadata, which are not exactly multimedia standards but very relevant for the hypermedia field. Informally we mean by resource a book, a document, an image, ... For dealing with resources it is important to identify them (in a unique, persistent way), locate them (which is associated to identifying them, but might proceed in an indirect way) and describe them (this is where metadata, data about data, comes into the play).
Let us start with a known example. URL, introduced together with HTML and HTTP (Hypertext Transfer Protocol), stands for Uniform Resource Locator: it is in a way an approximation to an identifier / locator. Together with a Domain Name Server (DNS), a URL gives a path to locating a resource, the DNS identifies a DN with a physical IP address of a machine, and the rest of the URL gives a path in the machine identified by the IP and thus leads towards the resource. Let us imagine a different situation: the resource is an "official" e-book, which has an e-ISBN. This electronic ISBN would be the identifier, because it defines in a unique and persistent way the resource. There might be a lot of copies of the e-book, in different "physical-digital"places. If we have a way of designating them, we have the locators (unique for each copy in a location). This is a better way of achieving uniqueness than the URL, which by mixing identification and location creates duplication, and might lead to inconsistency. It should be more clear now that the URL is a mix of identifier and locator.
Continuing with the example, URLs do not help to understand what the resource is, and the same happens with identifiers or locators. An e-book has a title, author, contents, etc. which might help in the understanding of the resource. In general terms, we need descriptors of the resource, and the title, author, table of contents, can be some of them. Again as a way of example, the META tag in HTML can be used to describe the resource and its contents; it is a hidden part, and the meta indicates "metadata", data about data. But unlike URLs, the metadata are currently less universally accepted.
URLs are unique in the sense that IP numbers are unique and then a path is univocal in a machine; but are far from being persistent as URL not found is a very usual message appearing when Web browsing. An alternative which has been proposed is PURL (Persistent URL) developed by the Online Computer Library Center (OCLC), which is a URL pointing to an intermediate resolution service which retrieves the actual URL, which might have experienced changes. An example of PURL could be: http://purl.oclc.org/OCLC/PURL/FAQ.
Dublin Core
Metadata is a means to describe a resource; they should help us in two ways: to use resources and to find resources. The currently most sensible effort for consensus on metadata has the name of Dublin Core (DC), or Dublin Metadata Core Element Set, which takes the name from a workshop in Dublin, Ohio in 1995. The goal of DC is to define a minimal, but sufficient, core set of attributes, or elements, which can be used to provide a basic description of a resource. The current DC metadata set comprises 15 elements, grouped into three categories:
Resource Definition Framework (RDF)
RDF is a language for expressing metadata, that should provide for the exchange of machine-understandable information about Web resources and provide the facilities for the automatic processing of these resources. It is based on XML. It is work in progress of W3C.
The metadata activity of W3C started with PICS, Platform for Internet Content Selection, with the original objective of providing a form of rating system for Internet content (as in films, but with other uses as well); technically, it is based on ‘labelling’ the content, and, thus, all labels have to be defined in advance. The work on PICS is complete with PICS-1.1, and the focus has now moved to RDF.
In RDF each vocabulary is uniquely identified for a target metadata
application through the definition of schemas, composed of the elements,
the values these elements can have and their semantics.
RDF data consists of nodes and attached attribute/value pairs. Nodes can
be any web resources (pages, servers, ...), or other instances of
metadata. Attributes are named properties of the nodes, and their values
are either atomic (text strings, numbers, etc.) or other resources or
metadata instances. In order to store instances of this model into files
or to communicate these instances from one agent to another, a graph
serialization syntax is needed. The particular language used is XML, but
other languages could be used. Again, RDF in itself does not contain
any predefined vocabularies for authoring metadata; PICS is a rating
architecture, a digital library vocabulary is the Dublin Core, both
could be used.
We show here some examples taken from The idiot’s guide. This guide is highly recommended to know more. As in usual XML we can specify where the DTD is defined:
<RDF xmlns = "http://w3.org/TR/1999/PR-rdf-syntax-19990105#" >
And then, a description would take the form
<? xml version="1.0" ?>
<DC:Creator> Jacky Crystal </DC:Creator>
<DC:Date> 1998-01-01 </DC:Date>
<DC:Subject> Metadata, RDF, Dublin Core </DC:Subject>
</Description>
The World Wide Web (and the W3C)
According to their creators, the World Wide Web (WWW or W3 as it is
also designated) covers a variety of things:
- The idea of a boundless information world in which all items have a
reference by which they can be retrieved;
- The address system (URI initially, URL as it is called now)
implemented to make this world possible, despite many different
protocols (note: URLs have two parts, the first one giving the server
location according to DNS, the Domain Name Server based on IP; and the
second part the location of the file at this machine)
- A network protocol (HTTP) used by native W3 servers giving
performance and features not otherwise available;
- A markup language (HTML) which every W3 client is required to
understand, and is used for the transmission of basic things such as
text, menus and simple on-line help information across the net;
- The body of data available on the Internet using all or some of the
preceding listed items.
W3 had some precedents (WAIS, Gopher) for connecting Internet content. W3 succeeded by both providing in an easy way all the key elements; as well as for a policy of open source, and not proprietary schemes.
The W3 Consortium (or W3C as it is also called) was created to continue this open policy, and to promote W3 standards. It is not a classical standard body such as ISO, or ITU; and its creation was needed because of the fast pace of W3 development.
More specifically, MPEG7 is seeking to provide a multimedia content description interface. In order to do that it intends to standardize (hierarchically) descriptors, descriptor schemes, a Description Definition Language (DDL), as well as providing some system tools. A typical application to be enabled by MPEG7 is retrieving music similar to a melody hummed by some user. In order to enable this type of applications, the raw data has to be described through its features, which are distinctive characteristics of the data; descriptors are specific representations of features, which can be integrated into description schemes specifying structure and semantics. DDL is a language (based on, or related to XML) for the creation of description schemes.
To give a close example, a Timbre Descriptor Scheme was presented by
CUIDAD (UPF/IRCAM), for isolated instrument notes, based on
psycho-acoustical studies, with descriptors extracted from the signal
(such as Spectral Centroid, Harmonic Deviation ...), and with euclidean
distance yielding timbre similarity. The procedure usually involves
submitting a proposal with evidence of merit, performing core experiment
which have to receive also 3rd party validation; if accepted they have
to be DDL coded, and are integrated into the XM (Experimental Model,
with a set of C++ classes, a core created from UML, and strict coding
rules to be observed).
References
Further information about multimedia standards and related topics in a not updated resource collection of links: http://viswiz.gmd.de:8080/MultimediaInfo/ (I have saved a local version just in case).
Further information on JPEG:
- The official MHEG site: http://www.mhegcentre.com/, where one can also find and mheg tour.
- A classical paper about MHEG: Kretz F, Colaitis F: Standardizing Hypermedia Information Objects, IEEE Communications Magazine, p 60, May 1992.
Annex: HyTime
HyTime is a direct precedent of XML. XML has learnt a few lessons
from HyTime, and it is quite convenient to understand a few concepts
through this earlier implementation.
HyTime is formally the standard ISO/IEC 10744:1992 (a second edition
was published in August 1997); it is also called Hypermedia/Time-based
Structuring Language; as SGML, it is promoted by ISO (International
Standards Organisation) and belongs to a family of stantards different
from MPEG, JPEG, which are sponsored by ITU, a telecommunications body.
HyTime started with the goal of standardising hypermedia, unlike the
*PEG family; and it is based on SGML. It is also important to consider
the relation with another ISO/IEC standard, DSSSL, or Document Style
Semantics and Specification Language with reference 10179:1996,
Entities and linking in SGML
As we said earlier, entities in SGML give possibilities for developing linking, as they provide an indirect way of addressing elements as a whole. In a DTD the entities can be named with the syntax illustrated by the example:
<!ENTITY Chapter-One SYSTEM "chap1.sgm" >
Then, the entity can be called (and a replacement or substitution will take place) as in:
<MyDoc>
&Chapter-One;
</MyDoc>
Chapter-One is the entity name; then we have an external system identifier for the entity, in the form of chap1.sgm; we could have defined an external public identifier too, as illustrated by the definition:
<!ENTITY Chapter-Two PUBLIC "-//drmacro//TEXT Chapter 2//EN" >
We can use also the attributes ID and IDREF; with ID we give an element a unique identifier; with IDREF we can reference that element.
We are going to see next how HyTime gives a much more structured way of referencing (linking) than the basic constructs provided by SGML; it also provides other hypermedia functionalities, such as location addressing, and event scheduling.
Meta-DTD and architectural forms
HyTime is based on the concept of architectural forms, which are a sort of meta-DTDs, or templates to define elements in proper DTDs. The HyTime standard defines, using SGML syntax, a set of meta-element types which are called architectural forms, each having a unique name. This name is later used to associate real elements with the corresponding architectural forms by specifying the architectural form name as the value of the HyTime attribute. The HyTime attribute lets a HyTime-aware application recognize and apply to the element any particular processing associated with the specific HyTime construct. Architectural forms are thus similar to superclasses in object-oriented programming, where elements inherit the processing associated with the corresponding architectural form. However, SGML is not a programming language and this inheritance is only conceptual, although HyTime parsers can be based on OO programming with a high benefit, because of this trait.
Hyperlinking in HyTime (and how to introduce the constructs)
To overcome the limitations of SGML linking, HyTime defines defines several link constructs, of which we describe in some detail two, clink (contextual link) and ilink (independent link).
The clink is the simpler of the two. The origin anchor ("the context") is a part of one document while the endpoint can be in the same or another document. The clink markup is defined as follows:
<!attlist clink HyTime NAME clink
linkend IDREF #REQUIRED >
This <clink linkend=id17>clink</clink> is
linked to an element whose id is the value assigned to
the linkend attribute.
...
<ID=id17>This paragraph is a target of a clink from
the previous paragraph.
And as we said earlier, SGML constructs can be converted to HyTime by adding attributes; let us see how to do that for an element type xref into clink.
<!ATTLIST xref
HyTime NAME "clink"
HyTime NAMES "linkend ref" >
The ilink permits the link data to be stored externally, separate from the document(s) that the link markup connects. A simple example of ilink would be:
<ilink linkends="XYZ ABC">Politicians like Smith do not keep their promises </ilink>
...
<address id=XYZ doc=100 local=Smith06>
<address id=ABC doc=200 local=Smith08>
Where those last two are indirect addressings, and the first address would point to a promise of Smith and the second to a later declaration contradicting the first one.
We see in these definitions that there is a clear separation in HyTime between links and anchors, as there should be; and indirect addressing is provided. Thus there is a clear provision for an improved structured linking. Clink is a specialised version of the hylink construct; clink is self-anchoring and contextual because it can be used to initiate the link traversal; other HyTime constructs for hyperlinking are agglink (aggregation link) and varlink (variable link). In HyTime there is an ample support to define link traversal: the linktrav attribute can take the values: External, Internal, Return, Departure, Prohibited, and combinations of them.
Location Addressing
Another important aspect of HyTime is providing location addressing mechanisms related to indirect addressing. We introduce here some basic methods which are organized into four categories as we indicate next. Name space locations are references to names in defined name spaces, typically element IDs and entity names. Node locations mean that the information is organized into lists or trees consisting of nodes; for instance, a document (and other data types) can be viewed as a tree where each element and character data portion is a separate node. This structure allows to address by position within a structure rather than by name. Coordinate locations allow to address regions within coordinate spaces, for instance to position objects within nodes in lists. Semantic locations allow to address objects based on their properties facilitating queries. HyTime defines a query language (HyQ) which has been later superseded by the DSSSL query language (SDQL).
In HyTime indirection is supported; thus, the link target is usually resolved in a series of steps. The element type form named location address or nameloc contains one or several nmlist elements, where the nmlist element contains of one or several names. The names are either entity or element names (actually, ID attribute values). The nameloc element locates all of these elements. Links to entities are considered as links to the root element of the entity document. Typical declarations take the following form:
<!ATTLIST nameloc
id ID #IMPLIED >
<!ATTLIST nmlist
nametype (entity|element) element
docorsub ENTITY #IMPLIED >
The tree location structure is organised around the HyTime construct treeloc, whose content describes a path. The data location address or dataloc architectural form is used to refer to content within elements, by counting tokens (such as words): typically, an offset into the element, and a number of subsequent tokens. The example below creates a link to the word "HERE" by specifying an offset of five words and then counting one word.
...
<para id=target>
The dataloc link ends HERE
</para>
...
<dataloc id=dataloc.demo locsrc=target>
<dimlist>5 1</dimlist>
</dataloc>
...
This is a <clink linkend=dataloc.demo>link that uses dataloc addressing</clink>
to transcend the markup boundaries.
...
A dataloc element contains a dimlist element. The dimlist element contains two integers. The first integer specifies the number of tokens to the beginning of the target, the second its size (in number of tokens). Typical declarations have the form
<!ATTLIST dataloc
locsrc IDREF #IMPLIED
quantum (norm) norm
HyTime NAME "dataloc" >
<!ATTLIST dimlist
HyTime NAME "dimlist" >
As HyTime linking and addressing are based on indirection, one can use a sequence of locators to create durable links. A typical such location ladder might consist of a clink referring to a nameloc, addressing a treeloc, addressing a dataloc.
Scheduling
HyTime, defines an abstract and system-independent method of specifying spatial and temporal information separate from the content to be presented: event schedules within coordinate spaces. This should provide a robust way to define the presentation of multimedia content, so that changes to the content objects or the presentation system will not necessarily require rework of the presentation itself. In most existing multimedia authoring systems, the spatial and temporal specification is tightly bound with the data, typically using a programming or scripting language. This binding is usually very sensitive to the details of the data and the presentation system and environment.
We do not go in detail in this aspect; neither we address the more sofisticated rendition aspects of the presentation.
Organisation of the standard
HyTime architectural forms are organized into five modules, each
defining a number of facilities. These modules constitute clauses
6 through 10 of the standard. The Base module defines element and
attribute forms common to the other modules as well as concepts and
mechanisms for coordinate addressing, which is used both for location
addressing and event scheduling and rendition. The Location address
module defines the location addressing methods. The Hyperlinks
module defines the hyperlinking architectural forms. The Scheduling
module defines the architectural forms for describing finite
coordinate spaces and event schedules within those spaces. The Rendition
module defines the architectural forms used to describe the
rendition of events within event schedules.