Similarity Tool
The similarity tool computes the similarity (or match score; relevance) between two Semantic Signatures®. Similarity score is a value between 0 and 1. Except for floating point error, a signature matches itself with a score of exactly 1. A score below .4 should be considered suspicious, while a score above .8 should be considered very good.
Pseudocode
The following is pseudocode for computing the similarity between two signatures.
float score = 0;
for (dimension in signature1) {
if (dimension in signature2) {
score += signature1[dimension] * signature2[dimension];
}
}
return score;
In other words, take the sum of the products of the weights of the intersecting dimensions of two signatures (the dimensions they have in common).
Help Text for similarity
$ java -jar sh-tools.jar similarity Usage: java com.semantichacker.api.tools.Similarity [OPTIONS] file1 file2 Compute the similarity score of two Semantic Signatures Option Summary: file1 The first signature (XML from API) to compute the similarity of file2 The second signature (XML from API) to compute the similarity of -v, --verbose Print each matching dimension and rank the score --labels FILE The list of labels from the SemanticHacker Datafile, this allows labels without needing to get them from the API --nolabels Do not show labels in dimension printout. -h, --help Display this help Homepage: http://www.semantichacker.com
Examples
Here is a sample of getting two signatures from the API and computing their similarity score.
$ java -jar sh-tools.jar signature -t TOKEN -c java --xmlout --outfile java.xml $ java -jar sh-tools.jar signature -t TOKEN -c jdk --xmlout --outfile jdk.xml $ java -jar sh-tools.jar similarity -v java.xml jdk.xml Dim ID Sig1 Sig2 Weight Label 9442 0.301207 0.287189 0.086503 Computers/Programming/Languages/Java/Resources 9465 0.291356 0.142665 0.041566 Computers/Programming/Languages/Java/News_and_Media/Books 9443 0.234836 0.087548 0.020559 Computers/Programming/Languages/Java/Resources/Certification 9422 0.233443 0.126003 0.029415 Computers/Programming/Languages/Java 9467 0.209163 0.225452 0.047156 Computers/Programming/Languages/Java/Official_Documentation 9427 0.201700 0.160814 0.032436 Computers/Programming/Languages/Java/Development_Tools/Performance_and_Testing 9440 0.200207 0.398539 0.079791 Computers/Programming/Languages/Java/Implementations 9446 0.187670 0.242634 0.045535 Computers/Programming/Languages/Java/FAQs,_Help,_and_Tutorials/Tutorials 9445 0.185879 0.288379 0.053604 Computers/Programming/Languages/Java/FAQs,_Help,_and_Tutorials/FAQs 9423 0.173341 0.316496 0.054862 Computers/Programming/Languages/Java/Development_Tools 9441 0.172246 0.115069 0.019820 Computers/Programming/Languages/Java/Personal_Pages 9474 0.162494 0.087845 0.014274 Computers/Programming/Languages/Java/Applications 9456 0.156624 0.105102 0.016461 Computers/Programming/Languages/Java/Class_Libraries/Data_Formats 9449 0.152345 0.103168 0.015717 Computers/Programming/Languages/Java/Mailing_Lists 9453 0.151250 0.148467 0.022456 Computers/Programming/Languages/Java/Class_Libraries/Graphics 9452 0.150852 0.148467 0.022397 Computers/Programming/Languages/Java/Class_Libraries 9466 0.139011 0.081895 0.011384 Computers/Programming/Languages/Java/News_and_Media/Magazines_and_E-zines/Articles 9755 0.138812 0.181120 0.025142 Computers/Programming/Threads/Java 9433 0.133239 0.090449 0.012051 Computers/Programming/Languages/Java/Server-Side/JavaServer_Pages 0.6511292
The following shows the output without the -v parameter.
$ java -jar sh-tools.jar similarity java.xml jdk.xml 0.6511292