Relation Classification via Sequence Features and Bi-Directional LSTMs
REN Yuanfang, TENG Chong, LI Fei, CHEN Bo, JI Donghong†School of Computer, Wuhan University, Wuhan 430072, Hubei, China
Structure features need complicated pre-processing, and are probably domain-dependent. To reduce time cost of pre-processing, we propose a novel neural network architecture which is a bi-directional long-short-term-memory recurrent-neu- ral-network (Bi-LSTM-RNN) model based on low-cost sequence features such as words and part-of-speech (POS) tags, to classify the relation of two entities. First, this model performs bi-direc- tional recurrent computation along the tokens of sentences. Then, the sequence is divided into five parts and standard pooling functions are applied over the token representations of each part. Finally, the token representations are concatenated and fed into a softmax layer for relation classification. We evaluate our model on two standard benchmark datasets in different domains, namely SemEval-2010 Task 8 and BioNLP-ST 2016 Task BB3. In SemE- val-2010 Task 8, the performance of our model matches those of the state-of-the-art models, achieving 83.0% in F1. In BioNLP-ST 2016 Task BB3, our model obtains F1 51.3% which is comparable with that of the best system. Moreover, we find that the context between two target entities plays an important role in relation classification and it can be a replacement of the shortest dependency path.
Key words:Bi-LSTM-RNN; relation classification; sequence features; structure features
 Zhou J, Lü C, Ji D H, et al. Framework construction and application for global health information platform [J]. Wu-han University Journal of Natural Sciences, 2015, 20(2): 153-158.
 Ferrucci D A. Introduction to “this is watson” [J]. IBM Journal of Research and Development, 2012, 56(3.4):1-1.
 Li X, Zhang Y, Lu J, et al. A classification method forweb information extraction [J]. Wuhan University Journal of Natural Sciences, 2004, 9(5): 823- 827.
 Doddington G R, Mitchell A, Przybocki M A, et al. The automatic content extraction (ace) program-tasks, data, and evaluation [C/OL] // Proc of the LREC. [2016-02-15]. https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/lrec2004-ace-program.pdf.
 Hendrickx I, Kim S N, Kozareva Z, et al. Semeval-2010 task 8: Multi-way classification of semantic relations between pairs of nominal [C]//Proc of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions. Stroudsburg: Association for Computational Linguistics, 2009: 94-99.
 Bossy R, Golik W, Ratkovic Z, et al. Bionlp shared task 2013–an overview of the bacteria biotope task [C/OL]// Proc of the BioNLP Shared Task 2013 Workshop. 2013: 161-169. [2016-02-15]. https://www.aclweb.org/anthology/ W/W13/ W13- 20.pdf#page=173.
 Dele’ger L, Bossy R, Chaix E, et al. Overview of the bacteria biotope task at Bionlp shared task 2016[C]//ProcBioNLP Shared Task Workshop. Berlin: Association for Computa-tional Linguistics, 2016: 12-22.
 Zhang M, Zhang J, Su J, et al. A composite kernel to extract relations between entities with both flat and structured fea-tures [C/OL] // Proc of the 44th Association for Computa-tional Linguistics. 2006: 825-832. [2016-02-15]. http://acl- arc.comp.nus.edu.sg/archives/acl-arc-090501d4/data/pdf/anthology-PDF/P/P06/P06-1104.pdf.
 Chan S, Roth D. Exploiting syntactico-semantic structures for relation extraction [C/OL]// Proc of the 49th Association for Computational Linguistics. 2011: 551-560. [2016-02-15]. https://pdfs.semanticscholar.org/5e46/fc68ede1108529f4db78bc7e1def69d70ba3.pdf.
 Shen F, Zhang J, Yuan X. Novel method of mining classification information for SVM training [J]. Wuhan University Journal of Natural Sciences, 2011, 16(6): 475-480.
 Li Q , Ji H. Incremental joint extraction of entity mentions and relations [C/OL]// Proc of the 52nd Association for Computational Linguistics. 2014: 402-412. [2016-02-15]. http://nlp.cs.rpi.edu/paper/jointmentionrelation.pdf.
 Kordjamshidi P, Roth D, Moens M. Structured learning for spatial information extraction from biomedical text: Bacteria biotopes[J]. BMC Bioinformatics, 2015, 16(1):129.
 Lü C, Chen B, Lü C Z, et al. A multiple feature approach to disorder normalization in clinical notes[J]. Wuhan University Journal of Natural Sciences, 2016, 21(4): 482- 490.
 Plank B, Moschitti A. Embedding semantic similarity in tree kernels for domain adaptation of relation extraction [C/OL]// Proc of the 51st Association for Computational Linguistics. 2013:1498-1507. [2016-02-15]. http://disi.unitn. it/moschitti /since2013/2013_ACL_Plank_EmbeddingSeman- ticSimilarity.pdf.
 Zeng D, Liu K, Lai S, et al. Relation classification via con-volutional deep neural network [C/OL]//Proc of 25th COLING. 2014:2335-2344. [2016-02-15]. http://www.nlpr.ia.ac.cn/ cip/~ liukang/liukangPageFile/camera_coling2014_final. pdf.
 Socher R, Huval B, Manning C, et al. Semantic composi-tionality through recursive matrix-vector spaces [C/OL]// Proc of the 2012 Joint Conference on EMNLP and COLING . 2012: 1201-1211. [2016-02-15]. http://ttic.uchicago.edu/~ haotang/ speech/SocherHuvalManningNg_EMNLP2012.pdf.
 Xu Y, Mou L, Li G, et al. Classifying relations via long short term memory networks along shortest dependency paths [C/OL]// Proc of the EMNLP. [2016-02-15]. 2015:1785- 1794. https://arxiv.org/pdf/1508.03720.pdf.
 Chen D, Manning C. A fast and accurate dependency parser using neural networks [C/OL]// Proc of the EMNLP. 2014: 740-750. [2016-02-15]. http://www.aclweb.org/anthology/ D14-1082.
 Ebrahimi J, Dou D. Chain based RNN for relation classification [C/OL]// Proc of the NAACL. [2016-02-15]. 2015:1244- 1249.https://www.cs.uoregon.edu/Reports/DRP-201412-Ebrahimi.pdf.
 Liu Y, Wei F, Li S, et al. A dependency-based neural network for relation classification [C/OL]//Proc of the 53rd ACL and the 7th IJCNLP. 2015: 285-290. [2016-02-15]. https:// arxiv. org/pdf/1507.04646.pdf.
 Santos D, Xiang B, Zhou B. Classifying relations by ranking with convolutional neural networks [C/OL]// Proc the 53rd ACL and the 7th IJCNLP. [2016-02-15]. 2015: 626-634. https:// arxiv.org/pdf/1504.06580.pdf.
 Xu K, Feng Y, Huang S, et al. Semantic relation classifica-tion via convolutional neural networks with simple negative sampling [C/OL]// Proc of the Conference on EMNLP. 2015: 536-540. [2016-02-15]. https://arxiv.org/pdf/1506.07650.pdf.
 Yu M, Gormley M, Dredze M. Factor-based compositional embedding models [C/OL]// Proc of the NIPS Work-shop on Learning Semantics. 2014: 95-101. [2016-02-15]. http:// www.cs.cmu.edu/~mgormley/papers/yu+gormley+dredze.nipsw.2014.pdf.
 Hochreiter S. The vanishing gradient problem during learn-ing recurrent neural nets and problem solutions [J]. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 1998, 6(2): 107-116.
 Duchi J, Hazan E, Singer Y. Adaptive sub-gradient methods for online learning and stochastic optimization [J]. Journal of Machine Learning Research, 2011 12(Jul): 2121-2159.
 Goller C , Kuchler A. Learning task-dependent distributed representations by backpropagation through structure [C]// Proc of IEEE International Conference on Neural Networks. Washington D C: IEEE Press, 1996: 347-352.
 Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality[C/OL]// Proc NIPS. 2013: 3111-3119. [2016-02-15]. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf.
 Manning C, Surdeanu M, Bauer J, et al. The Stanford Corenlp natural language processing toolkit [C/OL]// Proc of the 52nd Association for Computational Linguistics. 2014: 55-60. [2016-02-15]. https://nlp.stanford.edu/pubs/ Stanford CoreNlp2014.pdf.
 Miller G. Wordnet: A lexical database for English [J]. Com-munications of the ACM, 1995, 38: 39-41.
 Ciaramita M , Altun Y. Broad-coverage sense disambiguation and information extraction with a super-sense sequence tagger [C/OL]// Proc of the EMNLP. 2006: 594-602. [2016- 02-15]. https://www.aclweb.org/anthology/W/W06/W06-16. pdf #page=616.
 Pyysalo S, Ginter F, Moen H, et al. Distributional semantics resources for biomedical text processing [C/OL]// Proc LBM. 2013:39-44. [2016-02-15]. http://bio.nlplab.org/pdf pyy-salo 13literature.pdf.
 Mou L, Peng H, Li G, et al. Discriminative neural sentence modeling by tree-based convolution [C/OL]// Proc of the EMNLP. 2015: 2315-2325. [2016-02-15]. https://arxiv.org/ pdf/ 1504.01106.pdf.