| 
               
             | 
            
			
               
            
            
                  
                     
                        LDSE - local domain search engine
						This program is a result of my diploma thesis (with the same title). It is 
						a distributed search engine, consisting of multiple search nodes which report their results
                        to a master server. Each node should be responsible for indexing 
						and querying a local node (ideally this is a single web server). The nodes 
						are connected in a hierachical way. Every super node can execute a query with 
						its own index and it can query all (or a subset) of its sub nodes. This is done
						by determining the sub-nodes which can give the best results for the query.
                         
                        It is possible to query every node directly, even if it is not the top-node. It will then use
                        its own data and the data of its sub-nodes for answering the query.
						 
						Special features are: 
						- distributed search engine 
						- tolerant against writing errors and other words formes 
						- separated data server and data gatherer 
						- can support many file formats via plugin mechanism, supported are 
						- PDF 
						- HTML 
						- plain text 
						- ZIP and gzip files 
						- uses any relational database (maybe with small changes because of differences in the SQL dialect) 
						- tested with InstantDB and Oracle Lite 
						- can gather data via HTTP or from local file system 
						- HTTP spider is resistent against loops (if a document links against itself, but in another path) 
						- HTTP Spider is resitsnet againss HTML errors (like missing "'s in parameters or non-quoted &'s) 
						
                         
						For the fault tolerance, a so-called "trigram index" is used. This index takes 
						all trigrams (3-letter-combinations) which a word contains, and stores this 
						information in a reverse index. From the words to the documents there is another 
						reverse index.
						This gives a high speed for queries and a tolerance against mis-spelled words (either in the searched documents
                        or in the query). It can also find substrings in words.
						 | 
                   
                        
                  
                     
                        Download
the current version is 0.5.4 (source, all)
(each file is about 2.0 meg)
 | 
                   
                  
                     
                        Change History
                        
- the default gatherer.xml specified a wrong class name for the HTMLDocument
  (missing the de.)
 
- the HTTPSpider no longer pops up windows for authorization and cookies
 
- DataServer now checks for null-strings read from the input
 
- build.xml now references Acme.zip for the server classpath
 
                        
- changed DataClient
 
  - now the input reader is not emptied when issuing a command to the server
 
  - this is because when using JDK 1.3, the input blocks regardless that ready() returns true :(
 
- changed the query string in query.xml to "document"
 
- added the directory 'testdata' containing 9 simple files, changed 'gatherer.xml' to use this directory
 
- corrected the start script for the queryclient (tstuder.zip was missing)
 
- corrected the start script for the adminclient (wromng JAR file name, wrong class name)
 
- corrected the client JARs (resources bundles were missing)
 
- renamed resource bundles so that english is now the default (instead of german)
 
- corrected config files for the servlet (used wrong paths)
 
- included the missing source for the tstuder table component
 
- better error handling in Gatherer
 
  - now an exception is printed if a spider class cannot be found
 
- changed gatherer.xml to contain the correct class names for the document handler classes
 
- corrected more occurences of wrong column names (SIZE instead of DOC_SIZE, ID instead of DOC_ID :(
 
- the last-changed-date is now stored as string (milliseconds since 1/1/1970), because with JDK 1.3
 
  InstantDB only the day, but not the time, was stored when using DATE as data type
 
- corrected the buildfile to exclude unnecessary files from the distributions
 
                        
(not released) 
- now ant 1.2 is used as make tool
 
- reworked the build.xml
 
  - introduced filesets and patternsets to define the filesets for the certain programs
 
  - introduced configuration variables (e.g. for the database) to be able to change
 
    setting quickly
 
  - an up-to-date check is done before buildung the JARs
 
  - for building the JARs no temporary files are used
 
- moved some common classes to the new package de.hendriklipka.ldse.server.common
 
- translated the description of the communication protocols into english
 
- bugfix: there were wrong column names (MIMETYPE instead of MIME_TYPE) in the class RDBMSDBConnection
 
  - NOTE to myself: test ALL functions before releasing code :(
 
                        
- document server and gatherer can now read their config files from another directory
 
  (so they could be read from the cfg subdirectory...)
 
- corrected the startup scripts to use the correct class names
 
- build.xml was changed to include all necessary classes to the jar files
 
- build.xml was changes to create correct binary and source distributions
 
- documentserver.xml was changed to use InstantDB as default database
- translates more documentation into english (config)
 
- new email address for contact: ldse@hendriklipka.de
 
                        
- initial release
 
                      | 
                   
                
             |