HTTP Request Reference


The SemanticHacker API uses the HTTP protocol as its transport, via a REST pattern for web services. Several HTTP request types and structures are accepted by the API. Any HTTP client should be able to work with the API.

Each call to the API will use the following base URL:

http://api.semantichacker.com/sh/api

API Parameters

NameValueRequired
(Yes/No)
Usage Notes
token your access token Yes This parameter must be provided with each request to the API.
uri a valid URI No Used for indicating a location of content to the API.
type 'html', 'text' or 'wp' No Used to indicate how the text should be processed. Defaults to 'html'. More information on type is below in Text Types
content some content No Used for directly sending content to the API to process.
showLabels 'true' or 'false' No A debugging tool to provide labels in the XML response for a textual representation of the Semantic Signature® dimensions. Defaults to false. Labels are not necessary for signature manipulation, but provide a sense of what the Semantic Signature® is about.
While the API is capable of providing labels, this is not particularly efficient and wastes bandwidth. For this reason, SemanticHacker is distributing the Datafile which contains the labels for dimensions used in the API.

Text Types

There are three ways to have text processed once it reaches the API. The type chosen can greatly affect the quality of the result. The html type causes the API to strip out all HTML tags so just the non markup language of the document remains. The text type indicates the API should not process the input in any special way, as it is already plain text. The wp type indicates that the provided content is MediaWiki source, and should be stripped of tags and formatting. The wp scraper was optimized for the English Wikipedia but may work on other wikis, particularly MediaWiki wikis.

Accepted Methods

There are six ways to send us text from which a signature is generated. All methods must include token parameter. Replace the TOKEN in the examples below with the access token provided in the email received when you signed up for access to the SemantichHacker API.

  1. GET request with a URI parameter. We'll crawl the URI for text content.
  2. POST request with a URI parameter as application/x-www-form-urlencoded. We'll crawl the URI for text content.
  3. GET with a content parameter.
  4. POST with content parameter as application/x-www-form-urlencoded.
  5. POST with content as multipart/form-data.
  6. POST or PUT with content as the request body.

1) GET request with a URI parameter

This method lets our system do the work of getting the text from the URL behind the scenes. If the type parameter is not passed, HTML is assumed, and tags will be stripped away before processing. The method is easy and fast because you don't have to upload the content. Here is a simple example:

GET /sh/api?token=TOKEN&showLabels=true&uri=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FNeil_young HTTP/1.1
Host: api.semantichacker.com

2) POST request with a URI parameter as application/x-www-form-urlencoded

This is similar to #1 above, just using POST and a form-urlencoded content type. Again, type always defaults to HTML unless you explicitly provide a type parameter.

POST /sh/api HTTP/1.1 
Host: api.semantichacker.com
Content-Type: application/x-www-form-urlencoded

token=TOKEN&showLabels=true&uri=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FNeil_young

3) GET with a content parameter

This method can be used for text that is shorter then 1000 characters. Although there is no limit to the size of a GET parameter in the RFC, we limit this type of request for performance reasons.

GET /sh/api?token=TOKEN&showLabels=true&content=the%20art%20of%20computer%20science HTTP/1.1 
Host: api.semantichacker.com


4) POST with content parameter as application/x-www-form-urlencoded

This method is similar to #3. It is also limited to text shorter then 1000 characters for performance reasons.

POST /sh/api HTTP/1.1 
Host: api.semantichacker.com
Content-Type: application/x-www-form-urlencoded

token=TOKEN&showLabels=true&content=the%20art%20of%20computer%20science

5) POST with content as multipart/form-data.

This method can, and should, be used to upload larger content. It also easily integrates with existing tools that upload files using multipart forms. The content length is capped at 100,000 characters, again, mostly for performance reasons. Note that the Content-Type does not affect how the API itself processes the text. The default is to treat all incoming text as HTML and thus remove any mark up tags.

POST /sh/api HTTP/1.1 
Host: api.semantichacker.com
Content-Type: multipart/form-data; boundary=x42x 

--x42x
Content-Disposition: form-data; name="token"

TOKEN
--x42x
Content-Disposition: form-data; name="file"; filename="content"
Content-Type: text/plain

the art of computer science
--x42x--

6) POST or PUT with content as the request body.

This method is also for larger content. Up to 100,000 characters is acceptable.

PUT /sh/api?token=TOKEN HTTP/1.1 
Host: api.semantichacker.com
Content-Type: text/plain

the art of computer science