NAME

kasykcqdconfigxml - description of the Kasyk caching query configuration XML (<kasyk:cqdconfig>)


DESCRIPTION

The Kasyk caching query server configuration XML specifies the way the caching query server Kasyk caching query server (kasykcqd) will accept queries, decide on a Kasyk provider for a query, have the Kasyk provider process the query, cache the result and provide the client with a resulting Kasyk hitlist XML.

A very simple case of the caching query server configuration XML looks like this:

 <kasyk:cqdconfig xmlns:kasyk="http://www.kasyk.org/1.0">
  <messagelog name="filename"/>
  <searching location="3334"/>
  <resource name="all" default="yes">
   <provider location="3333"/>
   <index name="2003">
    <constraint>year = 2003</constraint>
   </index>
  </resource>
 </kasyk:cqdconfig>

<kasyk:cqdconfig xmlns:kasyk="http://www.kasyk.org/1.0">...</kasyk:cqdconfig>

The outer <kasyk:cqdconfig> container indicates that this is an XML specification of the configuration of a Kasyk caching query server. The namespace specification is necessary to verify that the version of the XML being processed is compatible with the version of Kasyk. It does not contain any further attributes.


<messagelog name="filename"/>

The <messagelog> container specifies the name of the file in which messages will be logged. If it is not specified, then either the KASYK_MESSAGELOG environment variable must be set with the filename, or the --messagelog option must be specified, to cause messages to be saved to a file.

The <messagelog> container must have one attribute, name, which specifies the name of the file to which messages will be logged. If the filename is relative, ie. does not contain any slashes, the file will be opened in the index directory. Prefix the filename with "./" if you want the logfile to be opened in your "own" current directory.


<searching location="" cache="">...</searching>

The <searching> container (and the containers within it) specify the various actions that need to be performed when certain events occur during searching the Kasyk index (the process of transforming a Kasyk query XML into a Kasyk hitlist XML).

The <searching> container itself can have two (optional) attributes:

location="host:port|port"

The location attribute specifies the default location on which the caching query server will be listening.

The value specified here can either be a combination of "hostname:portnumber", or if there is no colon (":") found, only a portnumber (assuming "localhost" for the hostname in that case).

Any setting here can be overridden with the --location option of Kasyk caching query server (kasykcqd). If there is no default location specified, the --location option must be specified when starting Kasyk caching query server (kasykcqd).

cache="cachesize"

A memory cache is used for keeping the resulting Kasyk hitlist XML. The cache attribute (roughly) specifies the amount of memory to allocate for that cache. The "cachesize" is a number which can be suffixed with K, M or G indicating kilobytes, megabytes or gigabytes. If only a number is used for "cachesize", it is taken to be the size in bytes.

By default a value of "10M" (10 Megabyte) is used, but normally a much larger value is specified, dependant somewhat on how large the index is and how much physical memory you want to be available to the searching process. Values of "100M" are not uncommon.


<providers retry=""/>

The <providers> container specifies settings related to connections of the Kasyk caching query server Kasyk caching query server (kasykcqd) with other Kasyk providers.

retry="timespec"

The retry attribute indicates the number of seconds the caching query server Kasyk caching query server (kasykcqd) should wait to attempt to connect to a Kasyk provider again after a connection to that Kasyk provider has failed. If the retry attribute has not been specified, then a value of 2 (seconds) will be assumed.

Any numeric value may be postfixed by "s" to indicate seconds, "m" to indicate minutes, "h" to indicate hours and "d" to indicate days.


<clients connected="" requests="" timeout="" pending=""/>

The <clients> container specifies settings related to client connections of the Kasyk caching query server Kasyk caching query server (kasykcqd).

connected="number"
The connected attribute specifies the maximum number of clients that can be connected to this Kasyk caching query server (kasykcqd) at the same time. The default is 128.
requests="number"
The requests attribute specifies the maximum number of requests that can be outstanding by clients to this Kasyk caching query server (kasykcqd) at the same time. The default is 128.
timeout="timespec|now|never"

The timeout attribute specifies the maximum time a client can be connected to this Kasyk caching query server (kasykcqd) without any requests coming in, before being disconnected by the Kasyk caching query server (kasykcqd). There are 3 ways to specify the timeout:

"timespec"
A number representing the number of seconds of inactivity allowed. Any numeric value may be postfixed by "s" to indicate seconds, "m" to indicate minutes, "h" to indicate hours and "d" to indicate days.
"now"
The word "now" indicating that the connection should always be broken after the Kasyk hitlist XML of a Kasyk query XML has been sent to the client.
"never"
The word "never" indicating that the connection should never be broken after the Kasyk hitlist XML of a Kasyk query XML has been sent to the client.

If the timeout attribute is not specified, "never" is assumed. Please note that this only applies to Kasyk query XML in which the id attribute has been specified. If no id attribute has been specified, then the connection is always broken after the client has received the Kasyk hitlist XML.

pending="number"
The pending attribute specifies how many connections can be pending before being accepted by Kasyk caching query server (kasykcqd). If the pending attribute is not specified, a value of 128 will be assumed. The --pending option of Kasyk caching query server (kasykcqd) can be used to override the (implicitely) specified value in the pending attribute.

The number of client connections allowed is printed to the message logfile as the first message after startup of the Kasyk caching query server Kasyk caching query server (kasykcqd).


<resource name="" default="">...</resource>

A <resource> container specifies all of the resources that are related to one physical Kasyk index. It indicates where the Kasyk provider for this Kasyk index is located, whether there is more than one redundant Kasyk provider to allow for load balancing, and whether there are "virtual" indexes that are defined by a specific constraint.

There are two attributes that can be specified:

name="indexname"
The name attribute must be specified. It indicates the name with which the Kasyk index (defined by this <resource> container) should be identified by clients (who need to specify that name in the <index> container in the Kasyk query XML).
default="yes|no|0|1"

The default attribute specifies whether it is allowed to put queries to this Kasyk caching query server (kasykcqd) that to not have an <index> container specified in the Kasyk query XML. If the value "yes" or "1" is specified, queries do not need to have an <index> container to be processed: the Kasyk index specified by this <resource> container will then be assumed.

If there is no default attribute is specified, then "no" will be assumed. However, if there is only one <resource> container and no <index> containers, then "yes" will be assumed.


<provider location="" requests="" timeout=""/>

At least one <provider> container should be specified within the <resource> container. The location attribute is mandatory, the timeout and requests

location="host:port|port|indexdir"

The location attribute specifies which Kasyk provider will be handling the requests for the Kasyk index(es) of this <resource>.

The value specified here can either be a combination of "hostname:portnumber", or, if there is no colon (":") found and the value consists of digits only, then it is assumed to be a portnumber (using "localhost" for the hostname in that case).

Finally, the value can be a Kasyk index directory specification. In that case, a two-way pipe will be opened with Kasyk searcher (kasyk) to this index directory, effectively creating a single threaded server that only runs if there are queries specified for it and which shuts down automatically if the pipes are closed. This is usually referred to as an "ad hoc" Kasyk server.

requests="number"
The requests attribute specifies the maximum number of requests that can be outstanding by this Kasyk caching query server (kasykcqd) to this Kasyk provider at the same time. A default value of 1 is assumed if the requests attribute is not specified. Higher values can be used to increase throughput. Higher values should be used on machines that are more powerful than others.
timeout="timespec|now|never"

The timeout attribute specifies the maximum time this Kasyk caching query server (kasykcqd) will be connected to this Kasyk provider without any requests being sent to it. There are 3 ways to specify the timeout:

"timespec"
A number representing the number of seconds of inactivity allowed. Any numeric value may be postfixed by "s" to indicate seconds, "m" to indicate minutes, "h" to indicate hours and "d" to indicate days.
"now"
The word "now" indicating that the connection should always be broken after the Kasyk hitlist XML of a Kasyk query XML has been received from the Kasyk provider.
"never"
The word "never" indicating that the connection to this Kasyk provider should never be broken by this Kasyk caching query server (kasykcqd).

If the timeout attribute is not specified, "never" is assumed.


<index name="" default="">...</index>

The <index> container in the caching query server configuration XML specifies a "virtual" Kasyk index that operates on a subset of the complete Kasyk index of the <resource> container in which this <index> container occurs. The subset of the Kasyk index is determined by the contents of a <constraint> container, specified inside this <index> container.

name="indexname"
The name attribute must be specified. It indicates the name with which the Kasyk index (defined by this <index> container) should be identified by clients (who need to specify that name in the <index> container in the Kasyk query XML).
default="yes|no|0|1"
The default attribute specifies whether it is allowed to put queries to this Kasyk caching query server (kasykcqd) that to not have an <index> container specified in the Kasyk query XML. If the value "yes" or "1" is specified, queries do not need to have an <index> container to be processed: the Kasyk index specified by this <index> container will then be assumed.

<constraint>...</constraint>

The text of the <constraint> container specifies an expression that will be added to any <constraint> container already specified in the Kasyk query XML. Only documents that match the expression in this <constraint> container and any expression that is specified in the <constraint> container of the Kasyk query XML of the client, will be part of the initial result set of the Kasyk hitlist XML.

A property of a document in a Kasyk index is a named quantity having a flag (boolean), numeric or string typed value. The set of allowable properties in any particular Kasyk index is specified by the <property> containers in the Kasyk configuration XML of that Kasyk index.

See Kasyk constraint expressions for a complete description of <constraint> expressions.


EXAMPLES

These are some examples of Kasyk configuration XML.


single index using an "ad hoc" Kasyk server

 <kasyk:cqdconfig xmlns:kasyk="http://www.kasyk.org/1.0">
  <resource name="all">
   <provider location="indexdir" timeout="5m"/>
  </resource>
 </kasyk:cqdconfig>

The caching query server only serves a single Kasyk index: only when there are requests coming in, will the server be started. Five minutes after the last request, that server will be shut down again.


single index using a three identical Kasyk servers on different machines

 <kasyk:cqdconfig xmlns:kasyk="http://www.kasyk.org/1.0">
  <resource name="all">
   <provider location="worker1.domain.com:3333"/>
   <provider location="worker2.domain.com:3333"/>
   <provider location="worker3.domain.com:3333"/>
  </resource>
 </kasyk:cqdconfig>

The caching query server only serves a single Kasyk index of which there are identical copies on 3 machines.


several virtual indexes an a "real" Kasyk server

 <kasyk:cqdconfig xmlns:kasyk="http://www.kasyk.org/1.0">
  <resource name="all" default="yes">
   <provider location="3333"/>
   <index name="press">
    <constraint>pressrelease</constraint>
   </index>
   <index name="2002">
    <constraint>
     published &gt;= 20020101 &amp; published &lt; 20030101
    </constraint>
   </index>
   <index name="Dijkmat">
    <constraint>vendor = "Dijkmat"</constraint>
   </index>
  </resource>
 </kasyk:cqdconfig>

The caching query server serves 4 distinct indexes which are all part of one physical Kasyk index. Index "all" (which is the default index) searches the entire index. Index "press" only searches press releases, index "2002" only searches documents published in the year 2002, and index "Dijkmat" only searches documents of which the vendor is "Dijkmat".


DTD

This is an attempt at a Document Type Definition for the Kasyk configuration XML.

 <!DOCTYPE kasyk:cqdconfig [
   <!ELEMENT kasyk:config (messagelog?, resource+, searching?)>
     <!ELEMENT messagelog EMPTY>
       <!ATTLIST messagelog name CDATA #REQUIRED>
     <!ELEMENT searching (providers?, clients?)>
       <!ATTLIST searching location CDATA #IMPLIED,
                           cache CDATA #IMPLIED>
       <!ELEMENT providers EMPTY>
         <!ATTLIST providers retry CDATA #IMPLIED>
       <!ELEMENT clients EMPTY>
         <!ATTLIST clients connected CDATA #IMPLIED,
                           requests CDATA #IMPLIED,
                           timeout CDATA #IMPLIED,
                           pending CDATA #IMPLIED>
     <!ELEMENT resource (provider+, index*)>
       <!ATTLIST resource name CDATA #REQUIRED,
                          default (yes|no|0|1) #IMPLIED>
       <!ELEMENT provider EMPTY>
         <!ATTLIST provider location CDATA #REQUIRED,
                            timeout CDATA #IMPLIED,
                            requests CDATA #IMPLIED>
       <!ELEMENT index (constraint?)>
          <!ATTLIST index name CDATA #REQUIRED,
                          default (yes|no|0|1) #IMPLIED>
          <!ELEMENT constraint (#PCDATA)>
 ]>

SEE ALSO

Kasyk home, Kasyk query XML, Kasyk hitlist XML, Kasyk successful query XML, Kasyk error XML, Kasyk initializer (kasyknew), Kasyk indexer (kasykindex), Kasyk searcher (kasyk), Kasyk server (kasykd), Kasyk caching query server (kasykcqd), Kasyk configuration handler (kasykconfig).

See http://www.kasyk.nl/xml/kasykcqdconfigxml.html for the most up-to-date version of this information.


COPYRIGHT

Copyright © 2003 Dijkmat BV

This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Kasyk XML Information: Kasyk version 1.0.0, XML version http://www.kasyk.org/1.0, generated on Tue Nov 25 12:09:47 2003.