PROGENETOR

L. Breure


The use of structured mark-up, like XML, increases precision in location and retrieval
of content. However, if content is simply copied for reuse, the problem of redundancy
remains unsolved. Advanced methods of hyper linking allow composition of virtual
documents, generated on demand, in response to user input. Like views on a database,
they are instantiated when required. Ideally, these documents do not contain any
content of their own, and consist of references to underlying digital sources only. This
paper discusses a strategy and a tool (PROGENETOR), to create virtual documents on
basis of existing XML source texts, in a desktop authoring environment, and targeted
on publishing through software for dynamic websites, like Cocoon and eXist.

Research on virtual documents and the reuse of content covers a vast range of
different strategies. Some rely on highly controlled authoring procedures (as for
technical manuals or training courseware), while others are model-driven, or based on
‘copy-by-reference’. The strategy discussed here belongs to the latter category and is
close to our daily practice of copy-and-paste. It is intended for material not explicitly
created for reuse, containing fragments that are worthwhile to integrate in another
context, perhaps with minor changes, but avoiding data replication with its resulting
redundancy.

PROGENETOR is not another XML editor, but a stand-alone editorial application with
an HTML user interface, suitable for working with text collections on disk. It comes
with several utilities, amongst others, to analyze text (e.g. to explore tag patterns and
calculate word frequencies), and to select and rearrange fragments for reuse. Once
selected and edited, text can be reduced to ‘skeleton’, for example, in the form of a
XInclude file. PROGENETOR is a tool kit, based on templates and a special tag set,
acting as a thin layer of ‘glue’ between standard XML, XSLT and JavaScript code,
and therefore easy to customize in detail.

 


Last modified: 16-09-2005 08:48