MA Thesis - Abstract

Abstract

The main goal of this MA thesis is to develop a method for automatically transforming natural language text into a formalization of a possible meaning of the text, expressed in the conceptual graphs of John Sowa. I have implemented my method in a computer program, and during the course of my thesis, I demonstrate empirically that my method works, by applying it to a specific piece of text.

My chosen text consists of parts of chapter 1 from the book of Genesis in the Old Testament of the Bible. I have chosen Hebrew as the particular natural language on which to test my method.

The method chosen is that of syntax-directed, ontology-guided, rule-based, step-wise transformation. This is but one of two competing methods described in the literature, the other being based on syntax-directed maximal joining of canonical graphs.

As input to my method, I have four classes of data: First, the Hebrew text itself, and second, a ready-made syntactic analysis of the text. Both are taken from the Hebrew WIVU-database developed by Prof. Dr. Eep Talstra and his research group, Werkgroep Informatica, at the Free University of Amsterdam. The third class of input data is an ontology of the concepts found in the text, derived from a matching of a concise Hebrew-English lexicon with WordNet. The fourth class of input data contains the relation hierarchy, the rules, and the lexicons I have developed as part of my method.

My method runs in three stages:

First, the syntax trees from the WIVU database are refined and transformed into more traditional generative syntax trees with smaller units. This step is necessary in order to make my method workable: One of the main assumptions of my thesis is that semantics can be viewed as being compositional in nature, meaning that the semantics of a text unit (e.g., a sentence) can be derived by breaking down the meaning into ever smaller units, going right down to the level of words and perhaps morphemes. Conversely, the meaning of a unit can then be constructed back again by composing together the individual parts of the meaning, directed by the syntax. The WIVU syntax trees, by themselves, have units which are far too large for compositional semantics to be workable, hence I must transform the trees so that the units are smaller. This is done in Step 1.

Second, having obtained a more refined syntax tree, I then transform the text into &qout;intermediate&qout; CGs. This is done by starting at the bottom of the tree (i.e., with words) and traversing the tree upwards, composing the meaning of the higher-level units from the meaning of the lower-level units by using rule-based, syntax-tree-directed, ontology-guided joining of conceptual graphs. This process carries on right up to clause-level, where a different algorithm takes over. The result is CGs which are &qout;quite good&qout;, but which still have bits of syntax left.

Third, the intermediate CGs are transformed into fully semantic CGs using rules. These rules have a premise-conclusion structure, and are capable of transforming both concepts, relations, and structure. The end result is CGs which are by now &qout;adequate&qout;, and which have no syntax left.

The thesis is divided into two parts. Part I contains background information necessary for understanding my method, whereas Part II develops and discusses the method itself.

Part I starts out with an introductory chapter (Chapter 1). After that, I introduce the three tools I have used, namely the Jython programming language, the Notio CG framework, and the Emdros text database engine (Chapter 2). In a short chapter, I then describe Hebrew as a language as well as the Hebrew WIVU database (Chapter 3). After that, I describe and discuss my ontology (Chapter 4), followed by a literature survey of the state of the art in text-to-CG transformation (5). This concludes Part I.

Part II starts out by introducing my method from a bird's eye perspective (Chapter 6). As I mentioned, the method runs in three stages, treated in the three subsequent chapters, namely: Refinement of the syntax-trees (Chapter 7); Transformation of the refined syntax trees to intermediate CGs (Chapter 8); And finally transformation of intermediate CGs to fully semantic CGs (Chapter 9). Having thus developed my method, I discuss and philosophize over the method (Chapter 10). Finally, I round off the thesis in a concluding chapter (Chapter 11).

What is its Bibliographic reference?

Petersen, Ulrik. (2004) Creation in Graphs: Extracting Conceptual Structures from Old Testament Texts. MA thesis, University of Aalborg, Department of Communication. Published in: Impact -- an Electronic Journal on Formalisation in Media, Text and Language -- Impact Theses, http://www.impact.aau.dk/theses.html.