Jeffrey Heinz

Dept. of Linguistics and Cognitive Science.
University of Delaware
42 E. Delaware Avenue
Newark, DE 19716

Home          Dissertation          Committee          Abstract          Stress Typology         


Inductive Learning of Phonotactic Patterns

Dissertation [pdf] (1.5MB) (for 2-sided printing better to use this pdf)

Edward P.Stabler and Kie Zuraw, advisors.



Chairs: Edward P.Stabler, Kie Zuraw

Members: Bruce Hayes, Stott Parker, Colin Wilson



This dissertation demonstrates that significant classes of phonotactic patterns---patterns found over contiguous sounds, patterns found over non-contiguous segments (i.e. long distance agreement), and stress patterns---belong to small subsets of logically possible patterns whose defining properties naturally provide inductive principles learners can use to generalize correctly from limited experience.

This result is obtained by studying the hypothesis spaces different formulations of locality in phonology naturally define in the realm of regular languages, that is, those patterns describable with finite state machines. Locality expressed as contiguity (adjacency) restrictions provides the basis for n-gram-based patterns which describe phonotactic patterns over contiguous segments. Locality expressed as precedence---where distance between segments is not measured at all---defines a hypothesis space for long distance agreement patterns. Finally, both of these formulations of locality are shown to be subsumed by a more general formulation---that each relevant phonological environment is defined `locally' and is unique---which I call neighborhood-distinctness.

In addition to patterns over contiguous and non-contiguous segments, it is shown that all stress patterns described in recent comprehensive typologies are, for small neighborhoods, neighborhood-distinct. In fact, it is shown that 414 out of the 422 languages in the typologies have stress patterns which are neighborhood-distinct for even smaller neighborhoods called `1-1'. Furthermore, it is shown that significant classes of logically possible unattested patterns do not. Thus, 1-1 neighborhood-distinctness is hypothesized to be a universal property of phonotactic patterns, a hypothesis confirmed for all but a few stress patterns which merit further study.

It is shown that there are learners which provably learn these hypothesis spaces in the sense of Gold (1967) and which exemplify two general classes of learners : string extension and state merging. Thus the results obtained hereprovide techniques which allow other hypothesis spaces possibly relevant to phonology, or other cognitive domains, to be explored. Also, the hypothesis spaces and learning procedures developed here provide a basis which can be enriched with additional, substantive phonological structure. Finally, this basis is readily transferable into a variety of statistical learning procedures.


Stress Typology

The stress typology is online here.

The finite state acceptors in the database are designed to be used with the fsa program which is available here.

Last updated: Aug 20, 2008