NAME

kasykhitlistxml - description of the Kasyk hitlist XML (<kasyk:hitlist>)


DESCRIPTION

The Kasyk hitlist XML specifies the result of a search operation on a Kasyk index. As such, it is the output created by Kasyk searcher (kasyk), Kasyk server (kasykd) or the Kasyk caching query server (kasykcqd) executables. A query is always in the form of Kasyk query XML.

This is a simplified and beautified example of hitlist XML:

 <kasyk:hitlist xmlns:kasyk="http://www.kasyk.org/1.0">
  <header type="fuzzy" hits="43" first="1" last="10" documents="3011"/>
  <hit ordinal="1">
   <preview><b>matched</b> words and more</preview>
   <properties>
    <filename>nameoffile</filename>
    <title>title of document</title>
   </properties>
  </hit>
  <hit ordinal="2">
   <preview>other <b>matched</b> words</preview>
   <properties>
    <filename>otherfilename</filename>
    <title>title of other document</title>
   </properties>
  </hit>
  <!-- hits 3 .. 10 omitted -->
 </kasyk:hitlist>

The outer <kasyk:hitlist> container indicates that this is XML of the result of a query searching the contents of a Kasyk index.

Please note the the above example has been beautified: in reality, there is no whitespace between any of the containers, except for newlines after <kasyk:hitlist>, </header> (or <header .../> if there are no containers inside the <header> container), </hit> and </kasyk:hitlist>.


<kasyk:hitlist xmlns:kasyk="http://www.kasyk.org/1.0">...</kasyk:hitlist>

The outer <kasyk:docseq> container indicates that this is an XML specification of a hitlist that is the result of a query. The namespace specification is necessary to verify that the version of the XML being processed is compatible with the version of Kasyk. It does not contain any further attributes.

The <kasyk:hitlist> container always contains one <header> container and any number of <hit> containers.


<header id="" type="" hits="" first="" last="" pass1hits="" updated="" document="" timing"">...</header>

The <header> container is not optional: it is always present. It may contain further optional <note> containers, as well as a single <provider> container.

Almost all of the attributes of the <header> container are always specified. They are:

id="string"
The id attribute is optional. Its value is the same as the value of the id attribute specified with the Kasyk query XML. It should be specified if you are querying a Kasyk provider asynchronously. That is, if you are not waiting for the result of a specific query before putting another query to the Kasyk provider. Specifically the Kasyk server (kasykd) and Kasyk caching query server (kasykcqd) can return results of queries in a different order, so you need to be able to identify the result with the right query. The id attribute can be used for that.
type="fuzzy|exact"
The type attributes which type of search was actually performed. It can either have the value fuzzy or exact. Having the type of search returned, is important in the case when there was no explicit type attribute specified in the Kasyk query XML, or when the type of search was altered because the Kasyk index did not support the type of search requested (in which case a <note> container, conforming to Kasyk notes, will also have been added to the <header> container>).
hits="number"

The hits attribute indicates the number of hits that are available in the final result set of the hitlist. It does not always indicate the number of <hit> containers: the actual number of <hit> containers is specified implicitely by the first and last attribute.

The value of the hits attribute is never more than the value (implicitely) specified with maxhits attribute of the Kasyk query XML. The value "0" may be specified if an error has occurred in the query.

first="number" last="number"

The first attribute specifies the ordinal number of the first <hit> container that may be part of this hitlist. The last attribute specifies the ordinal number of the last <hit> container that may be part of this hitlist.

Please note that the values of the first and last attribute are initially specified by the (implicitely) values specified first and last attribute in the Kasyk query XML. The value of the first attribute always remains the same. The value of the last attribute may be altered in the hitlist: it will never be more than the value of the hits attribute.

pass1hits="number"
The pass1hits attribute specifies the maximum number of hits that could become part of the hitlist if the value of the maxhits attribute in the Kasyk query XML had been made large enough. It is never more than the value of the maxpass1hits attribute specified with the Kasyk query XML and it is never less than the value of the hits attribute.
updated="seconds"
The updated attribute specifies the time the Kasyk index that was queried, has been updated. The time is specified in seconds since midnight GMT on Jan. 1, 1970 (often referred to as Unix time). The value "-1" may be specified if an error has occurred in the query.
documents="number"
The documents attribute specifies the number of documents that are available for searching in the selected Kasyk index. The value "0" may be specified if an error has occurred in the query.
timing="microsecs+microsecs+microsecs"

The timing attribute is optional. It is only specified if the showinternal attribute of the Kasyk query XML had the string "timing" specified in it. It is therefore mainly intended for developers only.

The value of the timing attribute is a string that consists of at least 3 numbers, seperated by "+", each representative for an elapsed time in microseconds. By adding these numbers and dividing them by 1 million, you roughly get the amount of time spent on the query (in fractional seconds).

Please note that the timing information always relates to the time of the original query, regardless of whether the hitlist was generated out of a cache or not (as indicated by the setting of the cached attribute in the optional <provider> container).

The containers of the <header> container are optional. They are:


<note id="" class="">...</note>

The <note> container is optional. When specified, it indicates some sort of error or warning that is relevant to the processing of the Kasyk query XML into the hitlist. The contents of the <note> container is an English language text describing the note, the id and class describe the note in a way that can easily be checked programmatically (either in XSLT or any other XML processor).

id="(name)"
The id attribute uniquely identifies the note. It can be used as a key in a set of alternate messages for the notes. Please see Kasyk notes for a complete list of possible notes.
class="Constraint|Info|Internal|Parse|Query"

The class attribute specifies to which class of notes this <note> container belongs. If any of the <note> containers in the <header> container contains a class attribute with a value other than "Info", then an error has occurred in the query, the hits attribute of the <header> container will be "0" and no <hit> containers will have been created.

The following classes of notes are currently defined:

"Constraint"
An error has occurred in the processing of the <constraint> container in the Kasyk query XML, which is probably user error in the specification of the <constraint>. For instance, using a wrongly spelled property in the text of the <constraint> might cause this type of note.
"Info"
Something noteworthy happened during the processing of the Kasyk query XML. For instance, a word that was specified in an exact type search was not found in the dictionary of the Kasyk index. This is the only class of notes that do not indicate an error of some kind.
"Internal"
An error has occurred in the processing of the Kasyk query XML, which is probably due to some programming error on the side of the developers of Kasyk. Please report these types of errors to bugs@kasyk.org.
"Parse"
An error has occurred in the parsing of the Kasyk query XML, which is probably user error in the specification of the Kasyk query XML. An example could be a wrongly spelled <container> or attribute or just plainly illegally formatted XML such as opening a container but not closing it.
"Query"
An error has occurred in the logical construction of the Kasyk query XML, which is probably a user error. An example of this would be specifying a <constraint> container more than once.

<provider location="" cached=""/>

The <provider> container is optional. It is only specified if the showinternal attribute of the Kasyk query XML had the string "provider" specified in it. It is therefore mainly intended for developers only.

Currently, at most one <provider> container may occur. In the future, this will change when distributed searching becomes available.

location="string"
The location attribute specifies the location of the Kasyk provider from which this hitlist originally came. If the Kasyk provider was a Kasyk searcher (kasyk) or Kasyk server (kasykd), then the startup specification of the index directory will be specified. If the Kasyk provider was a Kasyk caching query server (kasykcqd), then the value will be the same as the location attribute of the <provider> container in the Kasyk caching query server configuration XML that was selected to handle the query.
cached="0|1"
The cached attribute specifies whether the hitlist was the result of an actual query, or whether it was created out of a cached copy of a hitlist (such as is done by Kasyk caching query server (kasykcqd)). The value "0" indicates the hitlist was freshly queried, the value "1" indicates the hitlist was created out of a cached copy.

<hit ordinal="" docnum="" score="" percentage="">...</hit>

Each <hit> container corresponds to a document in the Kasyk index that has met all constraints imposed on it (meeting the expression in the <constraint> container, and in case of an "exact" type search, having all the required words of the <all> container and not having any of the words of the <not> container).

The <hit> container has one obligatory attribute (ordinal) and three optional attributes (docnum, score and percentage). The containers are also optional: <preview> and <properties>

ordinal="number"
The ordinal attribute specifies the ordinal number of the hit in the total hitlist. The value of the ordinal attribute of the first <hit> container always contains the value of the first attribute in the <header> container. The value of the ordinal attribute of the last <hit> container always contains the value of the last attribute in the <header> container.
docnum="number"

The docnum attribute is optional. It is only specified if the showinternal attribute of the Kasyk query XML had the string "docnum" specified in it.

The docnum attribute is a number, internal to the Kasyk index, associated with the most recent version of the document. It currently has no meaning outside of the Kasyk index. It is therefore mainly intended for developers only.

score="float+float"

The score attribute is optional. It is only specified if the showinternal attribute of the Kasyk query XML had the string "score" specified in it.

The score attribute is a string, consisting of at least 2 floating point numbers seperated by "+". By adding the numbers, you get a rough indication of the relevance of the <hit> container compared to other <hit> containers. It currently has no meaning outside of the Kasyk index. It is therefore mainly intended for developers only.

percentage="number"

The percentage attribute is optional. It is only specified if the showinternal attribute of the Kasyk query XML had the string "percentage" specified in it and the type of search was "fuzzy".

It gives a rough indication of the relevance of the <hit> container compared to other <hit> containers. It currently has no meaning outside of the Kasyk index. It is therefore mainly intended for developers only.


<preview>...</preview>

The <preview> container is optional. It is only specified if the showpreview attribute of the Kasyk query XML has been (implicitely) specified with "yes".

The <preview> container contains the part of the text of the document that has been deemed most relevant to its relative position in the hitlist. Words that have been deemed relevant inside this text, are placed in a highlighting container, usually <b>..</b> (but which can be altered with the highlight attribute of the <searching> container in the Kasyk configuration XML.


<properties>...</properties>

The <properties> container is optional. It is only specified if the showproperties attribute of the Kasyk query XML has been (implicitely) specified with "yes".

The <properties> container contains all of the containers of properties and texttypes that have had the hitlist attribute (implicitely) set to "yes" in the <creation> container of the Kasyk configuration XML.


EXAMPLES

The directory test/hitlist contains a number of subdirectories which each contain files in which each file is a single hitlist. Most of the hitlists are the result of legal queries, some of them are the result of queries with errors in them. They're used to test the functioning of the Kasyk search engine.


DTD

This is an attempt at a Document Type Definition for the Kasyk hitlist XML.

 <!DOCTYPE kasyk:hitlist [
   <!ELEMENT kasyk:hitlist (header, hit*)>
     <!ELEMENT header (note*, provider?)>
       <!ATTLIST header id        CDATA #IMPLIED,
                        type      CDATA #REQUIRED,
                        hits      CDATA #REQUIRED,
                        first     CDATA #REQUIRED,
                        last      CDATA #REQUIRED,
                        pass1hits CDATA #REQUIRED,
                        updated   CDATA #REQUIRED,
                        documents CDATA #REQUIRED,
                        timing    CDATA #IMPLIED>
       <!ELEMENT note (#PCDATA)>
         <!ATTLIST note id    CDATA #REQUIRED,
                        class CDATA #REQUIRED>
       <!ELEMENT provider (#PCDATA)>
         <!ATTLIST provider cached   CDATA #REQUIRED,
                            provider CDATA #REQUIRED>
     <!ELEMENT hit (properties?, preview?)>
       <!ATTLIST hit ordinal    CDATA #REQUIRED
                     docnum     CDATA #IMPLIED
                     score      CDATA #IMPLIED
                     percentage CDATA #IMPLIED>
       <!ELEMENT preview ANY>
       <!ELEMENT properties ANY>
 ]>

SEE ALSO

Kasyk home, Kasyk notes, Kasyk query XML, Kasyk configuration XML, Kasyk document sequence XML, Kasyk initializer (kasyknew), Kasyk indexer (kasykindex), Kasyk searcher (kasyk), Kasyk server (kasykd), Kasyk caching query server (kasykcqd), Kasyk configuration handler (kasykconfig).

See http://www.kasyk.nl/xml/kasykhitlistxml.html for the most up-to-date version of this information.


COPYRIGHT

Copyright © 2003 Dijkmat BV

This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Kasyk XML Information: Kasyk version 1.0.0, XML version http://www.kasyk.org/1.0, generated on Tue Nov 25 12:09:47 2003.