Welcome To WUJNS
武汉大学学报 英文版 | Wuhan University Journal of Natural Sciences
Wan Fang
CNKI
CSCD
Wuhan University
Latest Article
Boundary Recognition of Light-Pause Marks via Grammar Testing Method
Time:2018-5-25  
MO Yiwen, CHEN Bo , LEI Pei
1. College of Chinese Language and Literature, Wuhan University, Wuhan 430072, Hubei, China; 2. School of Computer, Wuhan University, Wuhan 430072, Hubei, China; 3. Department of Language & Literature, Hubei University of Art & Science, Xiangyang 441053, Hubei, China
Abstract:
Boundary recognition is an important research of natural language processing, and it provides a basis for the application of Chinese word segmentation, chunk analysis, named entity recognition, etc. Based on ambiguity in boundary recognition of Chinese punctuation marks, this paper proposes grammar testing methods for boundary recognition of slight-pause marks and then calculates the annotation consistency of these methods. The statistical results show that grammar testing methods can greatly improve the annotation consistency of slight-pause marks boundary recognition. The consistency during the second time is 0.030 3 higher than during the first, which will help guarantee the consistency of large-scale corpus annotation and improve the quality of corpus annotation.
Key words:slight-pause marks boundary; grammar testing; corpus annotation; Kappa statistics
CLC number:TP 301; H 085
References:
[1]	Wang Z, Xue N W. Joint POS tagging and transition-based constituent parsing in Chinese with non-local features [C]// Meeting of the Association for Computational Linguistics. Berlin: Association for Computational Linguistics, 2014: 733-742.
[2]	Dhivya R, Dhanalakshmi V, Kumar M A, et al. Clause boundary identification for Tamil language using dependency parsing[C] // International Joint Conference on Advances in Signal Processing and Information Technology. Berlin Heidelberg: Springer-Verlag, 2011:195-197.
[3]	Xue N W, Ng H T, Pradhan S, et al. CoNLL 2016 shared task on multilingual shallow discourse parsing[C] // Proceedings of the Fifteenth Conference on Computational Natural Language Learning Shared Task. Berlin: Association for Computational Linguistics, 2016: 978-986.
[4]	Li X, Palmer M, Xue N W. Large multi-lingual, multi-level and multi-genre annotation corpus [C] // Proceedings of the 10th Edition of the Language Resources and Evaluation Conference (LREC). Portorož: Jozef Stefan Institute, 2016: 906-913.
[5]	Kong F, Zhou G. Chinese comma disambiguation on K-best parse trees [J]. Communications in Computer & Information Science, 2014, 496: 13-22.
[6]	Li Y C, Gu J J, Zhou G D. Adding colon and semicolon label feature to Chinese comma classification [J]. Journal of Chinese Information Processing, 2014, 28(5): 215-222(Ch).
[7]	Qiu L K, Zhang Y, Jin P, et al. Multi-view Chinese Tree-banking[C] //Proceedings of the 25th International Confer-ence on Computational Linguistics. Dublin : Association for Computational Linguistics, 2014: 257-268.
[8]	Huang C R, Xue N W. Modeling Word Concepts without Convention: Linguistic and Computational Issues in Chinese Word Identification [M]. Oxford: Oxford University Press, 2015: 348-361.
[9]	Chen Y P, Zheng Q H, Zhang W. Omni-word feature and soft constraint for Chinese relation exraction[C] // Proceedings of  the 52nd Annual Meeting of the Association for Computational Linguistics Association for Computational Linguistics. Baltimore: Association for Computational Linguistics, 2014: 572-581.
[10]	Sun X, Matsuzaki T, Li W J. Latent structured perceptrons for large-scale learning with hidden information[J]. IEEE Trans Knowl Data Eng, 2013, 25(9): 2063-2075.
[11]	Celce-Murcia M, Mcintosh L. Teaching English as a Second or Foreign Language [M]. Piscataway: IEEE , 1979.
[12]	Zhou J S, Qu W G, Zhang F. Exploiting chunk-level features to improve phrase chunking[C] // Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Stroundsburg: Association for Computational Linguistics, 2012: 557-567.
[13]	Sun X, Matsuzaki T, Okanohara D, et al. Latent variable perceptron algorithm for structured classification[C] // Pro-ceedings of the Twenty-First International Joint Conference on Artificial Intelligence (IJCAI-09). San Francisco:Morgan Kaufmann Publishers Inc, 2009: 1236-1242.
[14]	Stab E C, Gurevych I. Annotating argument components and relations in persuasive[C] // International Conference on Coling. Dublin: Computational Linguistics, 2014: 1501- 1510.
[15]	Sergeant A. Automatic argumentation extraction [C] // Ex-tended Semanic Web Conference. New York: ACM Press, 2013: 656-660. DOI: 10. 1007/978-3-642-38288-8-46.
[16]	Shermis M D, Burstein J. Handbook of Automated Essay Evaluation: Current Applications and New Directions [M]. Rutledge: Taylor & Francis Group, 2013.
[17]	Attali Y, Lewis W, Steier M. Scoring with the computer: Alternative procedures for improving the reliability of holistic essay scoring [J]. Language Testing, 2013, 30(1):125-141. DOI:10.1177/026553212452396.
[18]	Cohen J. A coefficient of agreement for nominal scales [J]. Educational and Psychological Measurement, 1960, 20(1): 37-46.
[19]	Klebanov B B, Flor M. Argumentation-relevant metaphors in test-taker essays[C] // Proceedings of the First Workshop on Metaphor in NLP. Atlanta: NLP, 2013: 11-20. 
[20]	Luu A, Malamud S A, Xue N W. Converting SynTagRus dependency treebank into penn treebank style[C] // Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with ACL 2016 (LAW-X 2016). Berlin: Association for Computational Linguistics, 2016: 16-21.
[21]	Song L, Zhang Y, Peng X, et al. AMR-to-text generation as a Traveling Salesman Problem[EB/OL]. http: //arXiv preprint arXiv. 2016: 1609. 07451.
Welcome To WUJNS

HOME | Aim and Scope | Editoral Board | Current Issue | Back Issue | Subscribe | Crosscheck | Polishing | Contact us Copyright © 1997-2018 All right reserved