Limetrans can be regarded as a configuration frame for the use of Metafacture for library purposes. It makes use of a JSON configuration scheme and can be abstracted as:
{
"input" : {
...
},
"transformation-rules" : "...",
"output": {
...
},
...
}Input is generally configured like this:
"input" : {
"queue" : {
"path" : "a/path/to/your/input/file/",
"pattern" : "your-marc-xml-input-file.xml",
"sort_by" : "lastmodified",
"order" : "desc",
"max" : 1,
"normalize-unicode" : false,
"processor" : "MARC21"
}
}MARCXML is the default value for 'processor' thus 'processor' can be omitted when processing MARCXML data.
"transformation-rules" : "a/path/to/your/transformation/metafacture/rules/file.xml"By now, Limetrans is written to be used with Elasticsearch. Therefore, the output object mainly contains an Elasticsearch configuration, besides a JSON output option.
"output": {
"json" : "a/path/to/your/jsonlines/output/file.jsonl",
"elasticsearch" : {
"cluster": "elasticsearch-01",
"host": ["localhost:9300"],
"index" : {
"type" : "title",
"name" : "choose-your-own-index-name",
"timewindow" : "yyyyMMdd",
"settings" : "a/path/to/your/elasticsearch/settings.json",
"mapping" : "a/path/to/your/elasticsearch/mapping.json",
"idKey" : "the-id-field-name-configured-in-your-metafacture-rules-file"
},
"update" : false,
"delete" : false,
"bulkAction" : "index",
"maxbulkactions" : 100000
},
"pretty-printing" : false
}"type" : "title" is a suggestion, assuming you might want to transform and store book title information.
"catalogid" : "choose-your-own-catalog-id",
"collection" : "choose-your-own-collection"Please find examples for the configuration of Limetrans in the source code.
$ git clone [email protected]:hbz/limetrans.git$ cd third-party
$ wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-2.1.1.zip
$ unzip elasticsearch-2.1.1.zip
$ cd elasticsearch-2.1.1
$ bin/elasticsearchCheck with curl -X GET http://localhost:9200/ if all is well.
Make sure you have configured the cluster name in /etc/elasticsearch/elasticsearch.yml according to your Limetrans configuration.
Optionally, you may want to install the head plugin
$ cd third-party/elasticsearch-2.4.0
$ bin/plugin install mobz/elasticsearch-headIndent blocks by four spaces and wrap lines at 100 characters. For more details, refer to the Google Java Style Guide.
Please file bugs as an issue labeled "Bug" here.