Category Tool
The category computes the category for a signature. Note that the algorithm used here is not necessarily the 'perfect' algorithm, but a very simple one. The algorithm that works best for you depends on your application.
Algorithm
The algorithm used is fairly simple, as seen below. Note that 'blocked' refers to the blocked item list, as documented below.
//First determine the label to use for each dimension in the signature label = the label of the dimension parts = label.split("/") if (parts[0] is blocked) parts = null continue to the next label else stop end if parts is null //All of the labels start with a blocked name return "Other" else //Now build the label category = parts[0] for (i = 1; i < parts.length && i < maxLen; i++) if parts[i] in blocked stop else category += "/" + parts[i] end print category end
Help Text for category
$ java -jar sh-tools.jar category Usage: java com.semantichacker.api.tools.Categorize [OPTIONS] signature Compute the category of a Semantic Signature Option Summary: signature XML Signature file from the SemanticHacker API --labels FILE Labels file from the SemanticHacker datafile --max-len MAX Maximum number of elements in output category --blocked FILE File containing a list of dimension components which are not to be considered -h, --help Display this help Homepage: http://www.semantichacker.com
The --max-len option
The --max-len option changes the maximum length of the category to be returned. The default is 4.
$ java -jar sh-tools.jar signature -t TOKEN -c "hot apple pie" --xmlout > hotapplepie.xml $ java -jar sh-tools.jar category hotapplepie.xml Business/Consumer_Goods_and_Services/Home_and_Garden/Decor_and_Design $ java -jar sh-tools.jar category --max-len 2 hotapplepie.xml Business/Consumer_Goods_and_Services
Removing Categories
For some applications, certain category names (such as News and Media) do not make sense or result in poor categories. The category tool can take a list of label parts (the values delimited by / characters in a label) to block as a file passed via the --blocked parameter. If the first part of a label is blocked, then the next label (in the order of the signature) is chosen. If a later part of a label is blocked, the category is chopped before that part.
The blocked file should contain a list of label parts to be removed, one part per line.
$ java -jar sh-tools.jar category hotapplepie.xml Business/Consumer_Goods_and_Services/Home_and_Garden/Decor_and_Design $ echo Home_and_Garden > blocked $ java -jar sh-tools.jar category hotapplepie.xml --blocked blocked Business/Consumer_Goods_and_Services $ echo Business >> blocked $ java -jar sh-tools.jar category hotapplepie.xml --blocked blocked Home/Cooking/Baking_and_Confections/Pies_and_Pastry