A Word and Character-Cluster Hybrid Model for Thai Word Segmentation

Canasai Kruengkrai, Kiyotaka Uchimoto, Jun'ichi Kazama, Kentaro Torisawa, Hitoshi Isahara, and Chuleerat Jaruskulchai

Abstract

In this paper, we describe our system used in the InterBEST 2009 Thai Word Segmentation Shared Task. Our system is based on a word and character-cluster hybrid model which can effectively handle both known and unknown words. In addition, our model can be integrated with simple strategies for reducing annotation inconsistencies. Experimental results on in-domain and out-of-domain test data sets show the effectiveness of our system.

Download: pdf


Canasai Kruengkrai