ticcky/esalib
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Repository files navigation
My implementation of Explicit Semantic Analysis (ESA) library that we used at KMi, Open University to produce our submission at the NTCIR-9 CrossLink task.
== WARNING ==
The tool is verified to yield good results (meaning correlation with human judgement as reported in the original ESA paper) with the provided prebuilt English Wikipedia ESA background from 2005. I have not had success building the ESA background from the recent dumps of Wikipedia. Please let me know if you manage.
== Changelog ==
7.12.2013
- fixed a few mistakes in the tutorial
- merged pull request fixing a problem on MacOS
15.2.2013
- found out about problem with stemming - the example english background is stemmed by PorterStemmer, but my library uses SnowballStemmer; this results in a lot of OOV words and therefore low similarity scores
- added interactive mode to the analyzer - now you can pipe-in pairs of texts to compare (1 line = 1 text) and ESAAnalyzer produces the similarity scores
- added wikixray scripts that were missing from the tutorial
7.10.2012
- fixed a typo in analyzer bash script, causing only the first words to be analyzed; fixed handling of oov words; removed length filter (only words 3-100 chars long were considered)
29.9.2012
- added support for SQLite, so that the library is better usable for fast prototyping
25.3.2012
- initial release
== Files ==
- /example - see example data in /example where you can find an ESA background built from Wikipedia snapshot from 2005, and directly use it in our tools for assessing semantic similarity of English textis/words.
- /tutorial - basic instructions for building your own background
- /lib - Java libraries required to run
== So how to get ESA running in 2 minutes for English? ==
0.
# git co https://github.com/ticcky/esalib.git
# cd esalib
1. Create a symbolic link to the sample database
# ln -s example/esa_en.db esa_db.db
2. Get relatedness estimate of two texts:
# ./run_analyzer "computer" "apple"
Please don't hessitate to get in touch if you want to use my library but have troubles with it.