Course: TThu 09:30-10:50, Compling lab
Instructor: Jeffrey Heinz, jeffrey.heinz@stonybrook.edu
Office Hours: Mondays 2-5pm and by appointment
Today we discussed the quality of the characteristic sample for SOSFIA, as well as the OSL learning algorithm.
We briefly mentioned the Burness and McMullin method for inducing tiers to to tier-based OSL functions.
Today we discussed learning tradeoffs in the context of transducer learning.
Algortithm | Class it learns | Time Complexity | ||
---|---|---|---|---|
OSTIA | subsequential (DFTs) | O(n3) | ||
ISLFLA | ISL (definite DFTs) | O(n2) | ||
OSLFIA | OSL | O(n2) | ||
SOSFIA | any class given by fixed DFT | O(n1) |
Smaller, more well-structured learning classes can make learning easier. The subregular program aims to find classes of patterns which are both sufficiently expressive to describe natural language patterns and which are sufficiently structured to enable learning from small data.
OSTIA and ISLFLA are start merging algorithms (builds a prefix tree of the sample and then collapses states). OSLFLA draws the transitions into the DFT from the start state outwards. SOSFIA recognizes that the state space and transitions are fixed and aims to fill in what is unknown, which is just the outputs on the transitions.
SOSFIA cannot be used to learn the OSL class since OSL is not fixed by a DFT, unlike ISL-k. Many OSL functions are also (tier-based) reverse definite functions which can be represented by a fixed DFT Lambert and Heinz 2024. Anton points out these cannot handle blocking. Perhaps the blocking patterns are generalized definite?
We also discussed Gildea and Jurafsky’s 1996 application of OSTIA to English flapping.
On Tuesday we will study how SOSFIA works.
We continued our discussion of learning problems, and developed one related to learning strictly k-local languages. Those techniques have been generalized and developed in the papers String Extension Learning and Learning with Lattice-Structured Hypothesis Spaces. See also Typology emerges from simplicity in representations and learning.
On November 7, Jack will present Computing and classifying reduplication with 2-way finite-state transducers. Then Anton will present Computational universals in linguistic theory: Using recursive programs for phonological analysis.
FALL BREAK – NO CLASS
Short Paper Assignment (for students enrolled in 3 credits): Please turn in no later than Tuesday October 22, 2024 (in class). Please print your assignment and hand it in to me.
For this assignment, choose a morpho-phonological data set and provide an analysis of it. Your analysis should formalize the analysis using the logical transductions we have studied in class. Chapters 7 through 14 in Part 2 of DCP/DPC are samples of the kind of short paper expected. It is not necessary to compare your formalization to some other framework. It is necessary to be clear about the representations you are positing in the underlying and surface forms.
You may not choose one of the datasets in chapters 7 through 14 (unless you are proposing a radically different analysis.) You may choose a dataset you have previously analyzed using a different set of formal tools (such as with rules or OT). For example, datasets from Phonology 1 or 2 can be used, or even a dataset from an undergraduate phonology class or introductory textbook. You may choose a dataset you have collected, but such original research is not the purpose of this assignment, and may be more appropriate for a final project.
A collection of tex files providing macros and examples for your convenience. I commpiled the pdf in there with the command latexmk dolatian-russian.tex
.
Ola presented Chapter 9 on Lamba.
Geonhee presented Chapter 10 on Turkish.
We developed a plan for next week. Updated DCP/DPC text here