lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bob Carpenter <>
Subject Re: part of speech tagger
Date Tue, 21 Nov 2006 23:10:21 GMT
zzzzz shalev wrote:
> hello all,
>     i would like to retrieve during query time, the part of speech of each word in a
>   does anyone know of an implementation of a java part of speech api?

The standard statistical POS taggers, such as
the ones recommended (Brill's, OpenNLP, LingPipe)
use syntactic context to disambiguate.   (Aramorph
is the exception.)  Some of them, such as ours (LingPipe),
can return multiple answers with confidence scores.

What they can't do is determine the part-of-speech
of words in a bag of words from a query.  So whether
this will work will depend on whether the queries
come in in whole sentences.

Most of these systems are trained on newswire, too,
so they won't do as well with questions, which have
different syntactic forms in most languages.

For instance, "run home" might be a verb (run)
and noun (home), or the query might be about baseball
and it's really two nouns, "run" and "home" (not to
be confused with "home run", which is a compound
noun with an idiomatic meaning in baseball).

- Bob

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message