Improving Morphology Induction by Learning Spelling Rules

Unsupervised learning of morphology is an important task for human learners and in natural language processing systems. Previous systems focus on segmenting words into substrings (taking ⇒, but sometimes a segmentation-only analysis is insufficient (e.g., taking may be more appropriately analyzed as take+ing, with a spelling rule accounting for the deletion of the stem-final e). In this paper, we develop a Bayesian model for simultaneously inducing both morphology and spelling rules. We show that the addition of spelling rules improves performance over the baseline morphology-only model.

Jason Naradowsky, Sharon Goldwater