NLP - Cube

Sentence Splitting, Tokenization, Lemmatization, Part-of-speech Tagging, Dependency Parsing and Named Entity Recognition for more than 50 languages


Before running the server, you need the model's weights, and you can follow two approaches to get them:

Installing dyNET:

            pip install cython
            mkdir dynet-base
            cd dynet-base

            git clone
            hg clone -r 2355b22  # -r NUM specified a known working revision

            cd dynet
            mkdir build
            cd build
            cmake .. -DEIGEN3_INCLUDE_DIR=/path/to/eigen -DMKL_ROOT=/opt/intel/mkl -DPYTHON=`which python2`

            make -j 2 # replace 2 with the number of available cores
            make install

            cd python
            python2 ../../ build --build-dir=.. --skip-build install

Training the lemmatizer (example):

Use the following command to train your lemmatizer:

            python2 cube/ --train=lemmatizer --train-file=corpus/ud_treebanks/UD_Romanian/ro-ud-train.conllu --dev-file=corpus/ud_treebanks/UD_Romanian/ro-ud-dev.conllu --embeddings=corpus/ --store=corpus/trained_models/ro/lemma/lemma --test-file=corpus/ud_test/gold/conll17-ud-test-2017-05-09/ro.conllu --batch-size=1000

Running the server:

Use the following command to run the server locally:

            python2 cube/ --start-server --model-tokenization=corpus/trained_models/ro/tokenizer --model-parsing=corpus/trained_models/ro/parser --model-lemmatization=corpus/trained_models/ro/lemma --embeddings=corpus/ --server-port=8080