lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carsten Schnober <>
Subject Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?
Date Tue, 18 Dec 2012 19:51:23 GMT
Am 18.12.2012 12:36, schrieb Michael McCandless:
> On Thu, Dec 13, 2012 at 8:32 AM, Carsten Schnober
> <> wrote:

>> This is a relatively easy example, but how would deal with e.g.
>> annotations that include multiple tokens (as in spans), such as chunks,
>> or relations between tokens (and token spans), as in the coreference
>> links example given by Steven above?
> I think you'd do something like what SynonymFilter does for
> multi-token synonyms.
> Eg a synonym for "wireless network" - > wifi would insert a new token
> ("wifi"), overlapped on wireless.
> Lucene doesn't store the end span, but if this is really important for
> your use case, you could add a payload to that wifi token that would
> encode the number of positions that the inserted token spans (2 in
> this case), and then the information would be present in the index.
> You'd still need to do something custom at read/search time to decode
> this end position and do something interesting with it ...

Thanks for the pointer!
I'm still puzzled whether something there is an optimal way to encode
(labelled) relations between tokens or even spans; the latter part would
probably lead back to the synonym-like solution.

Institut für Deutsche Sprache |
Projekt KorAP                 |
Tel. +49-(0)621-43740789      |
Korpusanalyseplattform der nächsten Generation
Next Generation Corpus Analysis Platform

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message