lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Token/Payload API
Date Tue, 15 May 2007 13:31:16 GMT
One thing that I forgot to add that is now possible, via the Payload  
mechanism is based on a comment during your ApacheCon EU  
presentation, something to the effect that we can't score binary  
fields.  Now with Payload scoring, a binary Field is essentially a  
Document level payload.  It should be quite easy to implement a Query/ 
Scorer combination that has a callback to scorePayload if people are  
interested in such a thing.  I would propose, however, that if we go  
this route, we may want to overload the scorePayload method to pass  
in field information, i.e. field name.

And, of course, I haven't looked in depth into the FunctionQuery  
capabilities, which may already provide for this possibility.

Just thinking out loud,

On May 11, 2007, at 9:03 PM, Yonik Seeley wrote:

> On 5/11/07, Grant Ingersoll <> wrote:
>> On May 11, 2007, at 4:31 PM, Yonik Seeley wrote:
>> > I hadn't kept up with the payload discussion/patch, and just got
>> > around to looking at Token.
>> >
>> > public class Token implements Cloneable {
>> >  String termText;                               // the text of  
>> the term
>> >  int startOffset;                               // start in  
>> source text
>> >  int endOffset;                                 // end in source  
>> text
>> >  String type = "word";                                  //  
>> lexical type
>> >
>> >  Payload payload;
>> >
>> >
>> > It almost feels like we are going down the road of Field, adding  
>> more
>> > and more to the base class instead of using some other mechanism  
>> like
>> > inheritance.
>> So PayloadToken would be more inline with what you are thinking?
>> Then there becomes the need to do instanceof to determine when you
>> have payloads?
> I don't have a good answer for that one... a real inheritance solution
> would be invasive to the indexing code and probably not worth it at
> this point.  There is also the problem of mixing different (future)
> token properties... what you really want are mixins or something.
> At this point, just forget I brought it up ;-)
>> > A bigger problem, however, is that payloads will be lost by filters
>> > that aren't payload aware, and create new Tokens.  We had the same
>> > problem with position increments being lost.
>> >
>> > For this latter problem, I think the answer is to *not* create new
>> > tokens, and make all the properties of Token settable.
>> This seems reasonable.  I never quite understood the need to create
>> new tokens.  The other option may be to use a copy constructor, but
>> again, that seems wasteful.
> We have clone() when new tokens need to be created (that's needed when
> filters create more tokens, like synonym injection, etc).  Since Token
> could be subclassed, that's probably the right approach.
> -Yonik
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Grant Ingersoll
Center for Natural Language Processing

Read the Lucene Java FAQ at 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message