lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <>
Subject Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?
Date Thu, 13 Dec 2012 21:45:23 GMT
Parts-of-speech is available now, in the indexer.

LUCENE-2899 adds OpenNLP to the Lucene&Solr codebase. It does 
parts-of-speech, chunking and Named Entity Recognition. OpenNLP is an 
Apache project for natural-language processing.

Some parts are in Solr that could be in Lucene.

On 12/12/2012 12:02 PM, Wu, Stephen T., Ph.D. wrote:
>>> Is there any (preliminary) code checked in somewhere that I can look at,
>>> that would help me understand the practical issues that would need to be
>>> addressed?
>> Maybe we can make this more concrete: what new attribute are you
>> needing to record in the postings and access at search time?
> For example:
>   - part of speech of a token.
>   - syntactic parse subtree (over a span).
>   - semantically normalized phrase (to canonical text or ontological code).
>   - semantic group (of a span).
>   - coreference link.
> stephen
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message