kasykintro - Introduction to Kasyk, Knowhow About Searching Your Knowledge
Kasyk is a core engine for searching information in XML.
Kasyk is a search engine that is capable of both fuzzy and exact searching any data that can be represented in XML format. It reads a sequence of documents in XML format (Kasyk document sequence XML), processes the contents of these documents into a proprietary form that allows fast fuzzy and exact search processing.
Once documents have been processed in this manner (which process usually is referred to as "indexing"), it is possible to search the information by specifying a query in XML format (Kasyk query XML). In this query you can specify whether the search should be for exactly matching words, or for fuzzily matching patterns. You can also limit your search to documents that meet certain Kasyk constraint expressions, other than matching words or patterns in the text.
The result of the search query produces another document in XML format, the so-called hitlist (Kasyk hitlist XML). This hitlist can be converted with tools such as XSLT to HTML and presented as a webpage, or as text in an email, or in any other manner you see fit.
Kasyk has been designed for heavy duty usage: it allows for load balancing over multiple physical servers or starting "ad hoc" servers that only run when they are really needed. And it can cache hitlists so that paging through a huge hitlist with only 10 hits displayed at a time, becomes an operation that uses few resources (Kasyk caching query server (kasykcqd)).
Kasyk has also been designed to handle huge amounts of data. There is basically no limit to the size of the data that Kasyk can handle (although practical considerations may cause you to compartimentalize your data). This has been achieved by making the amount of RAM that Kasyk needs to be able to run, as independent of the size of data, or the size of any single document, as possible.
And even what limitations there currently are with Kasyk, they may become moot in the future when the distributed searching capabilities of Kasyk will be developed (future of Kasyk).
Kasyk is not a relational database engine, although it has some features in common with a relational database. The main difference between a Kasyk index and a relational database, is that you can not extract the exact data that you have put into it. If you must, you could consider Kasyk a "lossy" database engine: some of the information you put into it, will be irretrievable in the same form as it was put in. You can however search through the information you have put in.
In this sense, Kasyk is derivative to the source of information it has indexed: it can never replace the original. But it adds search capabilities that the original source of the data (be that a relational database system, a (local) file system or a remote website).
Kasyk home, Kasyk flow of information, Kasyk history, future of Kasyk.
See http://www.kasyk.nl/intro.html for the most up-to-date version of this information.
Copyright © 2003 Dijkmat BV
This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Kasyk Information: Kasyk version 1.0.0, generated on Tue Nov 25 12:09:47 2003.