Introduction


The match server and shell are part of the SemanticHacker Tools, which can be downloaded on the Downloads Page.

The SemanticHacker Match Server is a server tool that allows users to add, delete, retrieve and query (match) Semantic Signatures®. The Match Server allows users to store a large number of Semantic Signatures® and perform queries returning a list of similar signatures. Each signature in the resulting Match List, has a weight (Relevance Score) indicating how well it compared to the signature used in the query.

Getting Started

The first thing to do is start the Match Server. As with the other tools provided with the API, running sh-tools.jar with "server" as the only parameter will display usage and help text for the Match Server (see below).

Usage: java com.semantichacker.match.server.bin.MatchServerMain [OPTIONS]
Start a match server
Normal usage is just to pass a port with -p and an index file with -f

Option Summary:
        -p, --port PORT                 Port number to start the server on.
        -a, --addr ADDRESS              Address to bind server to
        -s, --socket FILE               Socket file to receive session on
        -i, --infile FILE               Input file to receive commands on
        -o, --outfile FILE              Output file to send responses on
        -f, --file FILE                 Path to the matcher index
        -r, --read-only                 Do not write to the index file
        -t, --sync-interval INTERVAL    Interval to sync changes to the
                                        matcher to disk (milliseconds, 60   
                                        seconds default)                    
        -M, --no-mmap                   Do not use memory-mapped files for
                                        load and save                       
        -R, --result-factor FACTOR      Set the result factor (default
                                        100,000,000)                        
        -I, --index-dims DIMS           Set the number of dimensions to
                                        index (default 10)                  
        -h, --help                      Print this help
	

Running the server

The minimum requirement to start the server is to provide a means to communicate with clients. The following command will start the server bound to port 1400.

java -jar sh-tools.jar server -p 1400

Using an Index File

In the command above, the file --file option was not included. If the server stops, all of the stored signatures in the index will be lost. The following,

java -jar sh-tools.jar server -p 1400 -f mySigIndex 

will tell the server to save the index into the file mySigIndex. The index is kept in memory, but will be saved to the file mySigIndex periodically during runtime and will be saved when the server is stopped under normal conditions. If the file mySigIndex existed at startup, the server would have first loaded the existing index into memory.

Quick Test

You can test the server to make sure it is working by using telnet or netcat to connect to the server. More information on the server's protocol can be found on the protocol page. Here is a simple example to make sure the server is up. < and > symbols have been added to show what is being sent to the server and how the server responds, respectively.

$ nc localhost 1400
< TX
< 200803
> 200803
< begin
> OK
< end
> OK

Disk Synchronization

While the Match Server is running, the default time between saving the Master Signature List to disk is 60 seconds. You can change this interval with the "--sync-interval" server option.

Using Memory Mapped Files

The server normally uses Memory Mapped Files to load and save the master signature list. Using memory mapped files may require extra memory at Save and Load times, but is faster than standard I/O. If memory is an issue, or if for any reason using Memory Mapped Files causes you problems, then the server option "--no-mmap" will turn off the use of Memory Mapped Files. When memory mapped files are disabled, standard I/O is used.

Result Factor

The signatures returned by the SemanticHacker API include dimension weights represented with normalized floating point values between 0 and 1. The Match Server does not use floating point values so the dimension weights must be converted to whole numbers before submitting them to the server. The Match Server maintains a value to help in this conversion, called the Result Factor. This value can be retrieved via the result_factor command.

To convert the decimal dimension weights into acceptable values to submit to the server, use only the whole number part of the result of multiplying each weight by the Result Factor value.

The default Result Factor value is 100,000,000 (One hundred Million).The default value can be changed with the server option ‚"--result-factor". Reducing the value of the Result Factor will reduce the precision of the weight values while increasing it will increase the precision. The result factor is stored in the index file, so changes are preserved from one session to the next.

Note: The result factor should not be changed once the match server has been initialized.