Comparing the University of South Florida Homograph Norms with Empirical Corpus Databy: Reinhard Rapp
Data Analysis, Machine Learning and Applications (2008), pp. 611-618.
|
Reviews
[Write a review of this article]
There are no reviews of this article
Find related articles from these CiteULike users
Find related articles with these CiteULike tags
AbstractThe basis for most classification algorithms dealing with word sense induction and word sense disambiguation is the assumption that certain context words are typical of a particular sense of an ambiguous word. However, as such algorithms have been only moderately successful in the past, the question that we raise here is if this assumption really holds. Starting with an inventory of predefined senses and sense descriptors taken from the University of South Florida Homograph Norms, we present a quantitative study of the distribution of these descriptors in a large corpus. Hereby, our focus is on the comparison of co-occurrence frequencies between descriptors belonging to the same versus to different senses, and to the effects of considering groups of descriptors rather than single descriptors. Our findings are that descriptors belonging to the same sense co-occur significantly more often than descriptors belonging to different senses, and that considering groups of descriptors effectively reduces the otherwise serious problem of data sparseness.
BibTeX record
RIS record