NLP - Cube

Sentence Splitting, Tokenization, Lemmatization, Part-of-speech Tagging, Dependency Parsing and Named Entity Recognition for more than 50 languages

Setup:

Before running the server, you need the model's weights, and you can follow two approaches to get them:

Installing dyNET:

            
            pip install cython
            mkdir dynet-base
            cd dynet-base

            git clone https://github.com/clab/dynet.git
            hg clone https://bitbucket.org/eigen/eigen -r 2355b22  # -r NUM specified a known working revision

            cd dynet
            mkdir build
            cd build
            cmake .. -DEIGEN3_INCLUDE_DIR=/path/to/eigen -DMKL_ROOT=/opt/intel/mkl -DPYTHON=`which python2`

            make -j 2 # replace 2 with the number of available cores
            make install

            cd python
            python2 ../../setup.py build --build-dir=.. --skip-build install
            
            

Training the lemmatizer (example):

Use the following command to train your lemmatizer:

            
            python2 cube/main.py --train=lemmatizer --train-file=corpus/ud_treebanks/UD_Romanian/ro-ud-train.conllu --dev-file=corpus/ud_treebanks/UD_Romanian/ro-ud-dev.conllu --embeddings=corpus/wiki.ro.vec --store=corpus/trained_models/ro/lemma/lemma --test-file=corpus/ud_test/gold/conll17-ud-test-2017-05-09/ro.conllu --batch-size=1000
            
            

Running the server:

Use the following command to run the server locally:

            
            python2 cube/main.py --start-server --model-tokenization=corpus/trained_models/ro/tokenizer --model-parsing=corpus/trained_models/ro/parser --model-lemmatization=corpus/trained_models/ro/lemma --embeddings=corpus/wiki.ro.vec --server-port=8080