There is a growing number of Sentiment Analysis REST APIs out there, and the potential user is faced with a lot of choice. Accuracy of analysis is the most important factor, and the best way to see if an analyzer will perform well in the intended task, is to run different analyzers on a sample of your data, and compare their output with manually assigned sentiment labels.
To make it easier for potential users to run such experiments, we’ve released a small open-source project. The project implements clients to several Sentiment Analyzers: Alchemy, Bitext, Chatterbox, Datumbox, Repustate, Semantria, Skyttle, and Viralheat. As input, it takes a text file with short texts, each annotated as positive, negative or neutral, and outputs a spreadsheet where responses of each API are recorded, as well as an accuracy rate and an error rate calculated against the manual labels.
The project is available on github: https://github.com/skyttle/sentiment-evaluation. Once you clone/unpack it, you will need to install requirements:
pip install -r requirements.txt
pip install -r requirements-testing.txt (optionally, for testing the code)
Next, obtain access keys for each API you’d like to evaluate, and put them into the
config.txt file that can be found in the root folder. The file is a two-column text file, where the first column is the name of the API provider and the second is the key.
After that you need to create “gold standard” data, annotating a set of short documents, and saving them to a text file. The text file is a two-column tab-separated file, the first column containing the document and the second the sentiment label, which can be one of “+” (positive), “-” (negative), or “0” (neutral). The root folder has a example –
Optionally, comment out APIs should not be included into the comparison in
To run the library:
compare.py <the name of the file with annotated texts>
The outputs are as follows. Labels assigned to each test document by each API are written in the CSV file called
To the stdout, the script prints the accuracy rate and the error rate achieved by each API.
Accuracy rate is the proportion of hits (cases when the automatically assigned label is the same the manually assigned one) to the total number of test documents.
Error rate is calculated taking into account whether a neutral label was confused with a positive or negative one (the error has the weight of 1), or a positive label was confused with a negative one (the error has the weight of 2). Error rate is the proportion of the sum of observed weighted errors to the maximum possible sum of weighted errors.
In addition, responses from all APIs are logged to a file.
You are welcome to fork the project and add more analyzers.
If you’ve used the tool in your experiments, please share the results with us!