Recognizing Textual Entailment (RTE) is to discover a semantic relation between a pair of texts, where one is often denoted by T (Text) and another by H (Hypothesis). The entailment relation means that if the meaning of H, as interpreted in the context of T, can be inferred from the meaning of T, then we say that T entails H, and the relation normally is directional as the meaning of one expression can entail the meaning of the other, while the opposite may not. A corollary of this is that if T entails H in a paraphrasing, T is redundant as it conveys no more information than H. Presently the recognition of textual entailment is one of the recent challenges in the field of the computational linguistics and machine learning and one of the most demanding research areas. It has relevance to the standard Natural Language Processing tasks of Information Extraction, Question Answering, Paraphrasing, Document summarization and comprehension.
Typically the task of recognising textual entailment could be handled by symbolic/propositional approaches, where T is regarded as a fact/evidence and H as a hypothesis/conclusion, then the relation between T and H can be logically inferred by the well established methods. However this has drawbacks as not every entailment can be based on logical reasoning in the absence of requisite background knowledge. There are some important issues which are related to the feature based representation of text and hypothesis, their extractions from text corpora, and their relation at different level of granularity and abstraction. For example, given two expressions of T and H, the relation between them can be inferred on the basis of either the lexical representation, syntactic representation, or semantic representation or a composite representation by accounting for different aspects of text processing. It is believed that different types of representations may lead to different relations drawn, in particular at levels of semantics and abstraction, resulting in that one would not agree that the meaning of T implies that of H, but actually it does.
The proposed project is aimed at utilizing the natural language processing open source systems to extract facts and hypotheses from text corpora based on different feature structures – text representation, and then apply learning models to discover semantic relations from the generated data sets. As a result, a number of models will be generated in respect of the different feature structures and the outputs from the models can be merged by information fusion based ensemble learning methods, to improve the discovery task. The prospective candidate of the project will closely be involved in the collaborative work with our industrial partner, particularly in a real-world application area.
First Supervisor: Bi, Y Dr
Second Supervisor: Wang, H Dr
Third Supervisor: Rooney, N Dr
Collaboration: This project does not involve collaboration with another establishment