lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: Best practice for embedding extra information in an index
Date Tue, 21 Sep 2010 23:24:45 GMT
Off the top of my head...
1) is certainly easiest. This looks suspiciously like synonyms. That is, at
    time you inject the ID as a synonym in the text and it gets indexed at
the same
    position as the token. Why this helps is that then phrase queries
continue to
    work. Lucene in Action has an example of creating a synonym analyzer.
2) I don't see how payloads really help you here. I confess I'm not
    familiar with payloads, but what I've seen is that they're useful when
    match the *term* and want to do something special. Uses I've seen are,
    for instance, parts of speech. So one can alter the score of, say, nouns
    to boost matches on nouns. But I don't recall seeing something that
    the payload data to be the match.
3)  I have no idea what an attribute is in this context <G>..... Although
    could simply create another field that contained all of the IDs for the
    document and add an SHOULD clause to all your queries on that field.


On Tue, Sep 21, 2010 at 3:11 PM, Christopher Condit <> wrote:

> I'm curious about embedding extra information in an index (and being able
> to search the extra information as well). In this case certain tokens
> correspond to recognized entities with ids. I'd like to get the ids into the
> index so that searching for the id of the entity will also return that
> document. I can think of three ways and I was curious if there's a preferred
> way:
> 1) Add the id as another token during filtering
> 2) Add the id as a payload
> 3) Add the id as an attribute (although I don't know how to search on the
> attribute value)
> Thanks,
> -Chris
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message