NLP - Cube

Sentence Splitting, Tokenization, Lemmatization, Part-of-speech Tagging, Dependency Parsing and Named Entity Recognition for more than 50 languages

Our Named Entity Recognition (NER) system employs Graph-Based-Decoding (GBD) over a hybrid network architecture composed of bidirectional LSTMs for word-level encoding, with a MultiLayerPerceptron for detecting links inside entities and a unidirectional LSTM for label estimation:

We are currently working on integrating the NER branch into master. Until then, you can check out the source for the GDB-NER system at this repository

Note: During the official PARSEME evaluation our system (GBD-NER) was affected by a bug which caused it to use only delexicalized features as input. After correcting this bug, we have re-trained and evaluated our system. See below for details:

Cross-lingual macro-averages

General ranking

SystemTrack#LangsMWE-basedToken-based
PRF1RankPRF1Rank
GDB-NER-fullclosed19/1960.3653.3856.65166.2855.7860.571
TRAVERSALclosed19/1967.5844.9754277.4148.5559.672
TRAPACC_Sclosed19/1962.2841.449.74368.5442.0652.135
TRAPACCclosed19/1955.6844.6749.57462.146.3753.094
CRF-Seq-nocategsclosed19/1956.1339.1246.11573.4443.4954.633
varIDEclosed19/1961.4936.7145.97664.1337.6347.437
CRF-DepTree-categsclosed19/1952.3337.8343.91764.6541.5650.66
GBD-NER-standardclosed19/1936.5648.341.62841.1152.21468
GBD-NER-resplitclosed19/1930.2652.9538.51933.8358.0342.7410
Veynclosed19/1942.7632.5136.941058.1336.5744.99
mumpitzclosed19/717.1413.0314.811124.9515.519.1212
Polirem-richclosed19/310.92.874.541213.073.89613
Polirem-basicclosed19/310.780.651.231311.330.681.2814
MWETreeCclosed19/190.213.720.41423.524.7824.1211
SHOMAopen19/1966.0851.8258.09176.2254.2763.41
Deep-BGTopen19/1033.4125.2928.79239.7726.4731.782
Milosopen19/49.177.878.47311.58.259.613
mumpitz-preinitopen19/12.281.92.0743.712.352.884

↑ Back to top

Language-specific system rankings

BG

SystemTrackMWE-based Token-based
PRF1RankPRF1Rank
GDB-NER-fullclosed79.4156.4265.97182.1055.7266.391
varIDEclosed63.5861.4962.52266.8261.1663.862
TRAPACCclosed70.6153.4360.83372.9454.4562.353
CRF-Seq-nocategsclosed76.348.0658.97482.6749.5161.936
TRAVERSALclosed75.5947.6158.42582.3649.7962.064
GBD-NER-standardclosed50.4969.158.35652.9470.4860.477
mumpitzclosed75.1246.4257.38786.9948.16624
CRF-DepTree-categsclosed73.9241.0452.78877.941.3153.998
TRAPACC_Sclosed80.838.9652.57984.4539.1253.479
GBD-NER-resplitclosed39.0171.7950.551041.2373.8752.9210
Veynclosed000n/a000n/a
MWETreeCclosed000n/a5.053.043.7911
Deep-BGTopen85.9652.9965.5619152.8266.851
SHOMAopen75.655.9764.32279.9856.4366.172

↑ Back to top

DE

SystemTrackMWE-based Token-based
PRF1RankPRF1Rank
TRAPACC_Sclosed53.2639.3645.27161.1342.2649.974
TRAPACCclosed53.2837.5544.05262.439.4948.375
TRAVERSALclosed62.9332.7343.06376.1738.3651.023
GDB-NER-fullclosed44.1333.9438.37464.2443.4951.871
CRF-DepTree-categsclosed52.7129.3237.68561.8733.9543.847
mumpitzclosed32.1538.3534.98655.914851.662
CRF-Seq-nocategsclosed40.2123.0929.34769.6833.9545.666
GBD-NER-resplitclosed27.1629.7228.38834.3341.3337.5110
GBD-NER-standardclosed26.4327.9127.15935.9740.6238.159
Veynclosed32.6522.2926.491057.5231.3840.618
varIDEclosed82.358.4315.31182.355.7410.7411
MWETreeCclosed0.762.811.2124.077.695.3212
Deep-BGTopen60.9436.3545.53177.9237.6450.763
Milosopen63.7434.9445.14267.7124.5135.994
SHOMAopen54.1537.9544.63369.740.8251.492
mumpitz-preinitopen43.3736.1439.43470.544.6254.651

↑ Back to top

EL

SystemTrackMWE-based Token-based
PRF1RankPRF1Rank
GDB-NER-fullclosed64.7449.1055.85174.0252.2261.241
TRAPACC_Sclosed61.6541.7249.76270.2542.6853.14
TRAVERSALclosed65.736.3346.79382.1642.0155.592
TRAPACCclosed64.3136.3346.43472.8637.0749.146
CRF-DepTree-categsclosed50.838.1243.56561.4441.3449.425
CRF-Seq-nocategsclosed54.4135.7343.13674.6641.9253.73
GBD-NER-resplitclosed29.5567.0741.03736.1274.5648.668
GBD-NER-standardclosed41.6839.5240.57851.9544.5247.959
mumpitzclosed4530.5436.39973.2136.82497
varIDEclosed86.5121.7634.771087.7320.3333.0211
Veynclosed39.0121.9628.11170.0228.5440.5510
MWETreeCclosed000n/a41.1324.4430.6612
SHOMAopen62.7953.8958178.658.0866.791
Milosopen42.7637.7240.08264.1647.0354.272

↑ Back to top

EN

SystemTrackMWE-based Token-based
PRF1RankPRF1Rank
TRAPACCclosed38.428.7432.88142.2328.9834.371
TRAVERSALclosed55.521.1630.64258.3120.3330.153
TRAPACC_Sclosed49.7721.7630.28353.521.0730.232
CRF-Seq-nocategsclosed49.7620.3628.9455.9520.3329.824
CRF-DepTree-categsclosed40.8718.7625.72547.218.5826.675
varIDEclosed59.3815.1724.17661.0914.4423.366
GBD-NER-standardclosed13.6932.5319.27714.9432.8420.547
GDB-NER-fullclosed10.5257.6817.79811.8660.8119.858
GBD-NER-resplitclosed9.5542.3215.5991145.1717.699
Veynclosed27.782.995.411040.663.46.2811
MWETreeCclosed0.610.60.61122.410.1213.9410
Milosopen33.8132.7333.27137.3231.8334.361
SHOMAopen45.6711.5818.47256.0911.8719.592

↑ Back to top

ES

SystemTrackMWE-based Token-based
PRF1RankPRF1Rank
TRAPACC_Sclosed29.544033.98135.9644.4339.752
CRF-Seq-nocategsclosed30.873633.24237.4741.7439.493
TRAPACCclosed25.6941.231.64332.0446.7838.034
TRAVERSALclosed28.8433.430.95439.9140.2640.091
GDB-NER-fullclosed22.2236.8027.71530.3846.1736.657
varIDEclosed17.0249.225.3620.1553.7429.318
CRF-DepTree-categsclosed15.7928.220.24726.5935.2230.35
GBD-NER-standardclosed11.8344.618.7815.6453.4824.210
Veynclosed17.2816.817.04931.6823.8327.29
GBD-NER-resplitclosed9.142.815.011012.453.4820.1312
mumpitzclosed9.661311.081131.8328.8730.286
MWETreeCclosed000n/a31.0317.5722.4311
SHOMAopen31.6548.838.39138.3353.5744.691
Deep-BGTopen24.534.228.55233.1338.6135.662

↑ Back to top

EU

SystemTrackMWE-based Token-based
PRF1RankPRF1Rank
GDB-NER-fullclosed82.0771.4076.36183.1271.7977.141
TRAPACC_Sclosed84.6968.675.8286.0769.3876.832
TRAPACCclosed85.566473.23386.9765.0174.44
CRF-Seq-nocategsclosed77.6762.669.32486.4265.8174.723
TRAVERSALclosed78.2858.466.9583.4265.0173.075
CRF-DepTree-categsclosed69.2958.263.26677.0461.0368.116
Veynclosed60.156462.02766.668.1967.397
varIDEclosed39.877652.31841.6178.8354.468
GBD-NER-standardclosed45.0558.250.79945.6758.6551.359
GBD-NER-resplitclosed29.1869.841.161029.9771.2742.210
MWETreeCclosed000n/a3.074.673.7111
SHOMAopen81.872.877.04186.2774.9580.211

↑ Back to top

FA

SystemTrackMWE-based Token-based
PRF1RankPRF1Rank
GDB-NER-fullclosed81.9675.2578.46188.6077.5482.701
GBD-NER-resplitclosed78.2377.4577.83284.1378.6281.282
GBD-NER-standardclosed78.1174.0576.02384.5475.6579.853
TRAPACCclosed86.9866.6775.48492.0767.8378.125
TRAPACC_Sclosed87.164.6774.23593.1265.6877.036
CRF-Seq-nocategsclosed81.2263.8771.51693.0567.3978.174
CRF-DepTree-categsclosed77.6957.6866.21787.9859.8471.238
TRAVERSALclosed73.858.4865.26890.1965.2375.77
Veynclosed80.4450.962.35992.4754.0968.259
varIDEclosed91.9829.7444.951096.628.1243.5610
MWETreeCclosed000n/a55.4935.8543.5610
SHOMAopen86.1271.8678.35193.8774.382.951

↑ Back to top

FR

SystemTrackMWE-based Token-based
PRF1RankPRF1Rank
TRAVERSALclosed77.1944.1856.19184.7248.7661.91
GDB-NER-fullclosed72.7642.3753.55281.8544.6657.792
varIDEclosed55.2446.5950.54363.449.755.723
TRAPACCclosed61.0938.1546.97470.5542.3652.935
CRF-DepTree-categsclosed61.1737.9546.84568.1540.5650.867
CRF-Seq-nocategsclosed63.636.1446.09682.6641.9355.644
TRAPACC_Sclosed72.133.7345.96782.2636.0450.128
mumpitzclosed56.833.5342.17881.2538.8652.576
GBD-NER-standardclosed36.9334.3435.59946.0238.5141.9310
Veynclosed39.9531.1234.991062.2639.4548.39
GBD-NER-resplitclosed22.1444.3829.551127.6251.1535.8711
Polirem-richclosed68.8212.8521.661282.5917.4228.7713
Polirem-basicclosed78.056.4311.871383.516.9212.7814
MWETreeCclosed000n/a34.6826.4730.0212
SHOMAopen71.8652.8160.88179.0456.0265.572
Deep-BGTopen57.8149.853.51278.8856.4565.81
Milosopen3444.1838.43349.2953.4651.293

↑ Back to top

HE

SystemTrackMWE-based Token-based
PRF1RankPRF1Rank
TRAVERSALclosed50.3315.1423.28174.6418.129.132
GBD-NER-standardclosed15.8939.6422.69219.8344.9427.523
TRAPACCclosed36.4114.1420.37347.5916.2824.264
Veynclosed23.3917.3319.91439.2923.8129.651
varIDEclosed69.2310.7618.62570.310.0417.588
TRAPACC_Sclosed56.829.9616.95669.021118.976
GDB-NER-fullclosed74.1909.1616.31775.7608.6615.5410
GBD-NER-resplitclosed8.5548.2114.52810.856.118.117
CRF-DepTree-categsclosed22.439.5613.41937.8214.1120.555
MWETreeCclosed000n/a36.629.9615.669
CRF-Seq-nocategsclosed000n/a26.981.472.7911
SHOMAopen61.3728.4938.91174.4331.2644.021

↑ Back to top

HI

SystemTrackMWE-based Token-based
PRF1RankPRF1Rank
CRF-DepTree-categsclosed77.9568.672.98181.1266.9473.351
CRF-Seq-nocategsclosed76.7964.269.93281.864.3172.012
TRAPACCclosed72.3966.669.38377.1766.1271.223
TRAPACC_Sclosed82.9158.268.39486.5356.4368.315
GDB-NER-fullclosed61.9072.8066.91565.1871.3868.146
TRAVERSALclosed66.360.663.32673.1567.12704
GBD-NER-standardclosed52.9774.862.02755.9573.2863.457
GBD-NER-resplitclosed52.9774.862.02755.9573.2863.457
varIDEclosed85.242.656.8886.7741.5856.228
Veynclosed000n/a000n/a
MWETreeCclosed000n/a44.8229.835.89
SHOMAopen76.846972.71184.368.5775.621

↑ Back to top

HR

SystemTrackMWE-based Token-based
PRF1RankPRF1Rank
TRAVERSALclosed68.0446.5955.3178.1450.7361.521
TRAPACC_Sclosed60.4234.9444.27268.4236.947.955
TRAPACCclosed51.3737.5543.39361.4242.1249.973
GDB-NER-fullclosed51.5237.3543.31464.4142.2250.692
GBD-NER-resplitclosed36.6944.5840.25545.2951.148.024
GBD-NER-standardclosed36.5343.5739.74644.0248.5346.177
CRF-DepTree-categsclosed40.1324.130.11765.9332.9743.968
CRF-Seq-nocategsclosed36.9823.0928.43875.0534.7147.466
Veynclosed34.0718.6724.12971.6727.1139.349
varIDEclosed79.076.8312.571083.77.0513.0111
MWETreeCclosed000n/a55.5215.6624.4310
SHOMAopen59.5839.9647.84179.5245.8858.191

↑ Back to top

HU

SystemTrackMWE-based Token-based
PRF1RankPRF1Rank
TRAPACCclosed92.4488.2790.31190.7285.44881
TRAPACC_Sclosed94.8785.8290.12293.5681.7587.252
CRF-Seq-nocategsclosed93.9477.9685.21395.1676.3184.73
TRAVERSALclosed88.0174.7480.84489.9179.6184.454
GDB-NER-fullclosed85.4776.5580.76590.5477.0983.276
CRF-DepTree-categsclosed8873.7180.22690.273.380.887
Veynclosed83.5575.2679.19793.8175.0583.395
GBD-NER-resplitclosed67.5220.3631.29869.6631.6543.528
GBD-NER-standardclosed67.1219.229.86968.9229.7141.529
varIDEclosed10010.3118.69101008.7416.0710
MWETreeCclosed2.5867.144.96113.8174.857.2511
SHOMAopen90.0881.9685.83193.4980.8786.731
Deep-BGTopen7871.2674.48280.7173.1176.722

↑ Back to top

IT

SystemTrackMWE-based Token-based
PRF1RankPRF1Rank
TRAVERSALclosed63.0940.3249.2174.4242.1153.781
GDB-NER-fullclosed49.5241.7345.30262.4645.5952.712
TRAPACCclosed52.4330.4438.52361.5430.3440.645
CRF-Seq-nocategsclosed55.1427.0236.27478.4933.0546.513
TRAPACC_Sclosed55.6623.7933.33565.4222.9934.029
CRF-DepTree-categsclosed44.7625.8132.74658.7829.839.556
varIDEclosed31.0734.0732.5739.2235.0637.027
Veynclosed34.0130.4432.13858.4138.1646.164
Polirem-richclosed72.3617.9428.76986.5421.934.968
GBD-NER-standardclosed15.4529.8420.361022.6835.4527.6710
GBD-NER-resplitclosed10.6928.8315.591116.6338.3123.1911
Polirem-basicclosed83.334.037.691281.823.486.6812
MWETreeCclosed000n/a1.456.582.3813
SHOMAopen50.3741.3345.4167.4946.5955.131
Deep-BGTopen45.5225.632.7727027.6339.622

↑ Back to top

LT

SystemTrackMWE-based Token-based
PRF1RankPRF1Rank
TRAPACC_Sclosed45.7624.832.17153.5225.0234.12
TRAPACCclosed33.9828.230.82242.4828.9434.431
TRAVERSALclosed29.6113.818.83355.5616.9225.943
GDB-NER-fullclosed07.1144.8012.27408.6349.6014.704
Veynclosed18.896.810548.8511.3118.375
GBD-NER-standardclosed3.7639.66.8764.7645.958.626
GBD-NER-resplitclosed3.7639.66.8774.7645.958.626
varIDEclosed55.5611.96755.561.342.618
MWETreeCclosed000n/a0.4244.520.849
CRF-Seq-nocategsclosed000n/a35.162.855.277
CRF-DepTree-categsclosed000n/a35.162.855.277
SHOMAopen35.7416.822.86156.7618.728.131

↑ Back to top

PL

SystemTrackMWE-based Token-based
PRF1RankPRF1Rank
GDB-NER-fullclosed75.0662.5268.22178.3062.1869.321
TRAVERSALclosed77.0259.2266.96281.8559.0368.592
TRAPACCclosed71.6252.4360.54378.0153.7963.684
GBD-NER-standardclosed59.3660.9760.15463.9361.7362.816
TRAPACC_Sclosed78.5548.3559.86583.148.8361.517
Veynclosed62.6955.1558.68673.5258.1264.923
CRF-Seq-nocategsclosed70.8846.856.37783.2850.8163.125
GBD-NER-resplitclosed42.5367.9652.32845.7769.455.168
mumpitzclosed62.0738.4547.48980.9241.3454.729
CRF-DepTree-categsclosed26.6644.4733.331037.9957.9445.8910
varIDEclosed86.116.0211.251188.895.7810.8512
MWETreeCclosed000n/a43.7830.5135.9611
SHOMAopen73.0556.3163.6179.3257.1366.422
Deep-BGTopen70.8756.763280.2357.8567.231

↑ Back to top

PT

SystemTrackMWE-based Token-based
PRF1RankPRF1Rank
TRAVERSALclosed76.852.0862.07185.1454.6966.61
varIDEclosed72.852.2660.84276.152.8562.383
GDB-NER-fullclosed66.5955.8860.77369.7857.5863.092
Veynclosed62.5552.2656.95471.2652.6960.584
TRAPACCclosed55.0150.6352.73561.5553.0156.966
TRAPACC_Sclosed65.7543.452.29672.6343.6254.518
CRF-DepTree-categsclosed55.141.0547.05763.2345.2352.739
mumpitzclosed44.7747.245.95863.9652.3757.585
CRF-Seq-nocategsclosed48.5135.2640.84978.0144.6756.817
Polirem-richclosed65.8323.6934.841079.2334.5648.1310
GBD-NER-standardclosed21.0849.3729.551123.7951.2432.4911
GBD-NER-resplitclosed14.8363.2924.031216.466.9626.3412
Polirem-basicclosed43.481.813.4713502.574.8813
MWETreeCclosed000n/a0.9444.351.8514
SHOMAopen71.1265.4668.17178.8168.8973.511
Deep-BGTopen72.4446.1156.35279.444.8357.32

↑ Back to top

RO

SystemTrackMWE-based Token-based
PRF1RankPRF1Rank
GDB-NER-fullclosed88.3583.7085.96189.5283.5286.411
TRAPACCclosed84.586.0885.28285.1986.285.692
TRAPACC_Sclosed89.1778.2783.36390.0577.7683.456
TRAVERSALclosed86.0679.6382.72488.8482.2685.423
Veynclosed80.0383.781.83582.7984.2383.55
CRF-Seq-nocategsclosed83.4579.6381.49687.7880.9984.254
varIDEclosed58.9189.8171.15760.2489.9872.177
CRF-DepTree-categsclosed74.6962.1467.84881.1163.6471.328
GBD-NER-standardclosed39.2885.5753.85941.2286.9955.939
GBD-NER-resplitclosed28.1287.9542.621028.8589.0443.5710
MWETreeCclosed000n/a57.631.3940.6311
SHOMAopen87.7886.5987.18190.2187.2288.691
Deep-BGTopen79.869.174.07292.1173.6681.862

↑ Back to top

SL

SystemTrackMWE-based Token-based
PRF1RankPRF1Rank
TRAVERSALclosed79.415464.29183.6154.5466.011
GDB-NER-fullclosed61.0752.4056.40268.5755.0861.092
varIDEclosed30.3170.242.34332.6471.6144.847
GBD-NER-standardclosed34.5648.840.46440.3451.3945.26
Veynclosed57.843140.36572.8733.0645.495
CRF-Seq-nocategsclosed55.6731.440.15682.1438.0151.973
CRF-DepTree-categsclosed54.5130.238.87784.2137.3851.774
TRAPACC_Sclosed33.3329.631.36847.2432.2638.338
GBD-NER-resplitclosed21.1659.231.17924.4964.2435.469
TRAPACCclosed20.2126.823.041036.9932.734.7210
MWETreeCclosed00.20n/a1.2344.832.411
SHOMAopen58.5647.252.27174.8152.2961.551
Deep-BGTopen58.938.446.49272.1940.3451.762

↑ Back to top

TR

SystemTrackMWE-based Token-based
PRF1RankPRF1Rank
GDB-NER-fullclosed68.0754.3560.44170.0654.6461.401
GBD-NER-standardclosed44.4746.0545.24247.948.0447.975
Veynclosed58.0736.9645.17370.7742.3953.022
CRF-Seq-nocategsclosed71.0532.0244.14489.0436.5651.833
CRF-DepTree-categsclosed67.7129.8441.43584.6233.6848.194
TRAVERSALclosed81.4826.0939.52688.3827.6642.136
GBD-NER-resplitclosed44.1125.8932.63747.427.0834.477
varIDEclosed4.1965.227.8785.2378.959.818
TRAPACCclosed1.641.581.6195.264.214.6810
TRAPACC_Sclosed1.120.590.781062.013.0111
MWETreeCclosed000n/a3.358.524.819
SHOMAopen81.445.8558.66187.2247.6661.631