太田・太田(2016)国立国語研究所論集
太田聡、太田真理、「連濁の生起率に基づく日本語複合語の分類 : 連濁データベースによる研究」、『国立国語研究所論集』、国立国語研究所、10、179–191、2016. doi: 10.15084/00000814
Abstract: Rendaku is one of the most well-known phonological phenomena in Japanese, which voices the initial obstruent of the second element of a compound. Previous studies have proposed that Japanese compound words can be classified on the basis of the frequency of rendaku (rendaku rate). However, since these studies used arbitrary criteria to determine clusters, such as 33% and 66%, as well as arbitrary numbers of clusters, it is crucial to examine the plausibility of such criteria. In this study, we examined the optimal boundary criteria as well as the optimal number of clusters using a clustering analysis based on Gaussian mixture modeling and the Rendaku Database (Irwin and Miyashita 2015). The cluster analyses clarified that the two-cluster model was optimal for classifying both compound nouns and compound verbs. The boundary values of the rendaku rate for these clusters were approximately 90% and 40% for the compound nouns and compound verbs, respectively. These results were inconsistent with the findings of previous studies. Our findings demonstrate that model-based clustering analysis is an effective method of determining optimal classification of linguistic data.