NAME

kasykdocseqxml - description of the Kasyk document sequence XML (<kasyk:docseq>)


DESCRIPTION

The Kasyk document sequence XML specifies the documents that should become part of a Kasyk index so that searching the contents of the documents becomes possible.


<kasyk:docseq>...</kasyk:docseq>

A simplified overview of the document sequence XML looks like this:

 <kasyk:docseq xmlns:kasyk="http://www.kasyk.org/1.0">
  <document>
   <properties>
    <!-- containers with contents of properties for this document -->
   </properties>
   <text>
    <!-- text to index, possibly with texttype containers, for this document -->
   </text>
  </document>
  <!-- repeat document containers here for all other documents -->
 </kasyk:docseq>

<kasyk:docseq xmlns:kasyk="http://www.kasyk.org/1.0">...</kasyk:docseq>

The outer <kasyk:docseq> container indicates that this is an XML specification of a document sequence to be used for indexing documents into a Kasyk index. The namespace specification is necessary to verify that the version of the XML being processed is compatible with the version of Kasyk. It does not contain any further attributes.


<document>

The <document> container contains all of the data of which a document consists in a Kasyk index. It can optionally contain a <properties> container, in which the properties associated with this document are specified. It also usually contains a <text> container which contains the searchable text of the document (and without text, it wouldn't make much sense to add the document to the index, generally speaking).

No attributes can be specified in the <document> container.


<properties>

The <properties> container contains zero or more containers, of which the names are defined in the <creation> container of the Kasyk configuration XML. Which properties are specified with which values, is entirely up to the person who has defined the Kasyk configuration XML and decides which property information needs to be associated with a document being indexed.

No attributes can be specified in the <properties> container.


<text>

The <text> container contains the text of the document being indexed. It may contain further containers that are defined in the <creation> container of the Kasyk configuration XML.

No attributes can be specified in the <text> container.


EXAMPLES

Here is an example of Kasyk document sequence XML.


from the test-suite

The document sequence is extracted from the one that is generated in the test-suite from files in the main distribution directory. Some large parts of the text have been removed for clarity.

 <kasyk:docseq xmlns:kasyk="http://www.kasyk.org/1.0">
  <document>
   <properties>
    <filename>AUTHORS </filename>
    <changed>1045865282</changed>
   </properties>
   <text>
    <title>Contributing Authors</title>
 ====================
 This is an alphabetical list of authors who have contributed
 to KASYK with their email address (if they want to have that
 listed):
 <!-- list of responsible people removed here -->
   </text>
  </document>
  <document>
   <properties>
    <filename>COPYING </filename>
    <changed>1044632527</changed>
   </properties>
   <text>
    <title>KASYK SEARCH ENGINE SOFTWARE </title>
 Copyright (C) 2003  Dijkmat BV
 <!-- legalese stuff removed here -->
   </text>
  </document>
  <document>
   <properties>
    <filename>GNUGPL </filename>
    <changed>1044290571</changed>
   </properties>
   <text>
    <title>		    GNU GENERAL PUBLIC LICENSE </title>
		       Version 2, June 1991
 <!-- legalese stuff removed here -->
   </text>
  </document>
  <document>
   <properties>
    <filename>HISTORY </filename>
    <changed>1049058254</changed>
   </properties>
   <text>
    <title>A short history of KASYK (or: More than you ever
    wanted to know about KASYK)</title>
 Although KASYK is now released as version 1.0.0, it is a
 completely proven system.  Approximately 10 man years have
 already been put into the development
 <!-- some otherwise interesting stuff removed here -->
   </text>
  </document>
 <!-- more documents here --->
 </kasyk:docseq>

DTD

This is an attempt at a Document Type Definition for the Kasyk document sequence XML.

 <!DOCTYPE kasyk:docseq [
   <!ELEMENT kasyk:docseq (document*)>
     <!ELEMENT document (properties?, text?)>
       <!ELEMENT properties ANY>
       <!ELEMENT text ANY>
 ]>

SEE ALSO

Kasyk home, Kasyk configuration XML, Kasyk query XML, Kasyk hitlist XML, Kasyk initializer (kasyknew), Kasyk indexer (kasykindex), Kasyk searcher (kasyk), Kasyk server (kasykd), Kasyk caching query server (kasykcqd), Kasyk configuration handler (kasykconfig).

See http://www.kasyk.nl/xml/kasykdocseqxml.html for the most up-to-date version of this information.


COPYRIGHT

Copyright © 2003 Dijkmat BV

This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Kasyk XML Information: Kasyk version 1.0.0, XML version http://www.kasyk.org/1.0, generated on Tue Nov 25 12:09:47 2003.