kasykhitlistxml - description of the Kasyk hitlist XML (<kasyk:hitlist>)
The Kasyk hitlist XML specifies the result of a search operation on a Kasyk index. As such, it is the output created by Kasyk searcher (kasyk), Kasyk server (kasykd) or the Kasyk caching query server (kasykcqd) executables. A query is always in the form of Kasyk query XML.
This is a simplified and beautified example of hitlist XML:
<kasyk:hitlist xmlns:kasyk="http://www.kasyk.org/1.0">
<header type="fuzzy" hits="43" first="1" last="10" documents="3011"/>
<hit ordinal="1">
<preview><b>matched</b> words and more</preview>
<properties>
<filename>nameoffile</filename>
<title>title of document</title>
</properties>
</hit>
<hit ordinal="2">
<preview>other <b>matched</b> words</preview>
<properties>
<filename>otherfilename</filename>
<title>title of other document</title>
</properties>
</hit>
<!-- hits 3 .. 10 omitted -->
</kasyk:hitlist>
The outer <kasyk:hitlist> container indicates that this is XML of the result of a query searching the contents of a Kasyk index.
Please note the the above example has been beautified: in reality, there is no whitespace between any of the containers, except for newlines after <kasyk:hitlist>, </header> (or <header .../> if there are no containers inside the <header> container), </hit> and </kasyk:hitlist>.
The outer <kasyk:docseq> container indicates that this is an XML specification of a hitlist that is the result of a query. The namespace specification is necessary to verify that the version of the XML being processed is compatible with the version of Kasyk. It does not contain any further attributes.
The <kasyk:hitlist> container always contains one <header> container and any number of <hit> containers.
The <header> container is not optional: it is always present. It may contain further optional <note> containers, as well as a single <provider> container.
Almost all of the attributes of the <header> container are always specified. They are:
The hits attribute indicates the number of hits that are available in the final result set of the hitlist. It does not always indicate the number of <hit> containers: the actual number of <hit> containers is specified implicitely by the first and last attribute.
The value of the hits attribute is never more than the value (implicitely) specified with maxhits attribute of the Kasyk query XML. The value "0" may be specified if an error has occurred in the query.
The first attribute specifies the ordinal number of the first <hit> container that may be part of this hitlist. The last attribute specifies the ordinal number of the last <hit> container that may be part of this hitlist.
Please note that the values of the first and last attribute are initially specified by the (implicitely) values specified first and last attribute in the Kasyk query XML. The value of the first attribute always remains the same. The value of the last attribute may be altered in the hitlist: it will never be more than the value of the hits attribute.
The timing attribute is optional. It is only specified if the showinternal attribute of the Kasyk query XML had the string "timing" specified in it. It is therefore mainly intended for developers only.
The value of the timing attribute is a string that consists of at least 3 numbers, seperated by "+", each representative for an elapsed time in microseconds. By adding these numbers and dividing them by 1 million, you roughly get the amount of time spent on the query (in fractional seconds).
Please note that the timing information always relates to the time of the original query, regardless of whether the hitlist was generated out of a cache or not (as indicated by the setting of the cached attribute in the optional <provider> container).
The containers of the <header> container are optional. They are:
The <note> container is optional. When specified, it indicates some sort of error or warning that is relevant to the processing of the Kasyk query XML into the hitlist. The contents of the <note> container is an English language text describing the note, the id and class describe the note in a way that can easily be checked programmatically (either in XSLT or any other XML processor).
The class attribute specifies to which class of notes this <note> container belongs. If any of the <note> containers in the <header> container contains a class attribute with a value other than "Info", then an error has occurred in the query, the hits attribute of the <header> container will be "0" and no <hit> containers will have been created.
The following classes of notes are currently defined:
The <provider> container is optional. It is only specified if the showinternal attribute of the Kasyk query XML had the string "provider" specified in it. It is therefore mainly intended for developers only.
Currently, at most one <provider> container may occur. In the future, this will change when distributed searching becomes available.
Each <hit> container corresponds to a document in the Kasyk index that has met all constraints imposed on it (meeting the expression in the <constraint> container, and in case of an "exact" type search, having all the required words of the <all> container and not having any of the words of the <not> container).
The <hit> container has one obligatory attribute (ordinal) and three optional attributes (docnum, score and percentage). The containers are also optional: <preview> and <properties>
The docnum attribute is optional. It is only specified if the showinternal attribute of the Kasyk query XML had the string "docnum" specified in it.
The docnum attribute is a number, internal to the Kasyk index, associated with the most recent version of the document. It currently has no meaning outside of the Kasyk index. It is therefore mainly intended for developers only.
The score attribute is optional. It is only specified if the showinternal attribute of the Kasyk query XML had the string "score" specified in it.
The score attribute is a string, consisting of at least 2 floating point numbers seperated by "+". By adding the numbers, you get a rough indication of the relevance of the <hit> container compared to other <hit> containers. It currently has no meaning outside of the Kasyk index. It is therefore mainly intended for developers only.
The percentage attribute is optional. It is only specified if the showinternal attribute of the Kasyk query XML had the string "percentage" specified in it and the type of search was "fuzzy".
It gives a rough indication of the relevance of the <hit> container compared to other <hit> containers. It currently has no meaning outside of the Kasyk index. It is therefore mainly intended for developers only.
The <preview> container is optional. It is only specified if the showpreview attribute of the Kasyk query XML has been (implicitely) specified with "yes".
The <preview> container contains the part of the text of the document that has been deemed most relevant to its relative position in the hitlist. Words that have been deemed relevant inside this text, are placed in a highlighting container, usually <b>..</b> (but which can be altered with the highlight attribute of the <searching> container in the Kasyk configuration XML.
The <properties> container is optional. It is only specified if the showproperties attribute of the Kasyk query XML has been (implicitely) specified with "yes".
The <properties> container contains all of the containers of properties and texttypes that have had the hitlist attribute (implicitely) set to "yes" in the <creation> container of the Kasyk configuration XML.
The directory test/hitlist contains a number of subdirectories which each contain files in which each file is a single hitlist. Most of the hitlists are the result of legal queries, some of them are the result of queries with errors in them. They're used to test the functioning of the Kasyk search engine.
This is an attempt at a Document Type Definition for the Kasyk hitlist XML.
<!DOCTYPE kasyk:hitlist [
<!ELEMENT kasyk:hitlist (header, hit*)>
<!ELEMENT header (note*, provider?)>
<!ATTLIST header id CDATA #IMPLIED,
type CDATA #REQUIRED,
hits CDATA #REQUIRED,
first CDATA #REQUIRED,
last CDATA #REQUIRED,
pass1hits CDATA #REQUIRED,
updated CDATA #REQUIRED,
documents CDATA #REQUIRED,
timing CDATA #IMPLIED>
<!ELEMENT note (#PCDATA)>
<!ATTLIST note id CDATA #REQUIRED,
class CDATA #REQUIRED>
<!ELEMENT provider (#PCDATA)>
<!ATTLIST provider cached CDATA #REQUIRED,
provider CDATA #REQUIRED>
<!ELEMENT hit (properties?, preview?)>
<!ATTLIST hit ordinal CDATA #REQUIRED
docnum CDATA #IMPLIED
score CDATA #IMPLIED
percentage CDATA #IMPLIED>
<!ELEMENT preview ANY>
<!ELEMENT properties ANY>
]>
Kasyk home, Kasyk notes, Kasyk query XML, Kasyk configuration XML, Kasyk document sequence XML, Kasyk initializer (kasyknew), Kasyk indexer (kasykindex), Kasyk searcher (kasyk), Kasyk server (kasykd), Kasyk caching query server (kasykcqd), Kasyk configuration handler (kasykconfig).
See http://www.kasyk.nl/xml/kasykhitlistxml.html for the most up-to-date version of this information.
Copyright © 2003 Dijkmat BV
This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Kasyk XML Information: Kasyk version 1.0.0, XML version http://www.kasyk.org/1.0, generated on Tue Nov 25 12:09:47 2003.