Sentence Splitting, Tokenization, Lemmatization, Part-of-speech Tagging, Dependency Parsing and Named Entity Recognition for more than 50 languages
Before running the server, you need the model's weights, and you can follow two approaches to get them:
pip install cython
mkdir dynet-base
cd dynet-base
git clone https://github.com/clab/dynet.git
hg clone https://bitbucket.org/eigen/eigen -r 2355b22 # -r NUM specified a known working revision
cd dynet
mkdir build
cd build
cmake .. -DEIGEN3_INCLUDE_DIR=/path/to/eigen -DMKL_ROOT=/opt/intel/mkl -DPYTHON=`which python2`
make -j 2 # replace 2 with the number of available cores
make install
cd python
python2 ../../setup.py build --build-dir=.. --skip-build install
Use the following command to train your lemmatizer:
python2 cube/main.py --train=lemmatizer --train-file=corpus/ud_treebanks/UD_Romanian/ro-ud-train.conllu --dev-file=corpus/ud_treebanks/UD_Romanian/ro-ud-dev.conllu --embeddings=corpus/wiki.ro.vec --store=corpus/trained_models/ro/lemma/lemma --test-file=corpus/ud_test/gold/conll17-ud-test-2017-05-09/ro.conllu --batch-size=1000
Use the following command to run the server locally:
python2 cube/main.py --start-server --model-tokenization=corpus/trained_models/ro/tokenizer --model-parsing=corpus/trained_models/ro/parser --model-lemmatization=corpus/trained_models/ro/lemma --embeddings=corpus/wiki.ro.vec --server-port=8080