Các bài báo công bố quốc tế
Nguyen, H.T. & Cao, T.H. (2008), Named Entity Disambiguation on an Ontology Enriched by Wikipedia, in: Proceedings of the 6th IEEE International Conference on Research, Innovation and Vision for the Future - in Computing and Communications Technologies (RIVF'2008), pp.247-254.
Link: IEEE Press (2008). download paper | download slide
Abstract: Currently, for named entity disambiguation, the shortage of training data is a problem. This paper presents a novel method that overcomes this problem by automatically generating an annotated corpus based on a specific ontology. Then the corpus was enriched with new and informative features extracted from Wikipedia data. Moreover, rather than pursuing rule-based methods as in literature, we employ a machine learning model to not only disambiguate but also identify named entities. In addition, our method explores in details the use of a range of features extracted from texts, a given ontology, and Wikipedia data for disambiguation. This paper also systematically analyzes impacts of the features on disambiguation accuracy by varying their combinations for representing named entities. Empirical evaluation shows that, while the ontology provides basic features of named entities, Wikipedia is a fertile source for additional features to construct accurate and robust named entity disambiguation systems.