A Word and Character-Cluster Hybrid Model for Thai Word Segmentation
Canasai Kruengkrai, Kiyotaka Uchimoto, Jun'ichi Kazama, Kentaro Torisawa, Hitoshi Isahara, and Chuleerat Jaruskulchai
Abstract
In this paper, we describe our system used in the InterBEST 2009 Thai Word Segmentation Shared Task.
Our system is based on a word and character-cluster hybrid model which can effectively handle both known and unknown words.
In addition, our model can be integrated with simple strategies for reducing annotation inconsistencies.
Experimental results on in-domain and out-of-domain test data sets show the effectiveness of our system.
Download: pdf
Canasai Kruengkrai