lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Token/Payload API
Date Tue, 15 May 2007 13:31:16 GMT
One thing that I forgot to add that is now possible, via the Payload  
mechanism is based on a comment during your ApacheCon EU  
presentation, something to the effect that we can't score binary  
fields.  Now with Payload scoring, a binary Field is essentially a  
Document level payload.  It should be quite easy to implement a Query/ 
Scorer combination that has a callback to scorePayload if people are  
interested in such a thing.  I would propose, however, that if we go  
this route, we may want to overload the scorePayload method to pass  
in field information, i.e. field name.

And, of course, I haven't looked in depth into the FunctionQuery  
capabilities, which may already provide for this possibility.

Just thinking out loud,
Grant

On May 11, 2007, at 9:03 PM, Yonik Seeley wrote:

> On 5/11/07, Grant Ingersoll <gsingers@apache.org> wrote:
>> On May 11, 2007, at 4:31 PM, Yonik Seeley wrote:
>>
>> > I hadn't kept up with the payload discussion/patch, and just got
>> > around to looking at Token.
>> >
>> > public class Token implements Cloneable {
>> >  String termText;                               // the text of  
>> the term
>> >  int startOffset;                               // start in  
>> source text
>> >  int endOffset;                                 // end in source  
>> text
>> >  String type = "word";                                  //  
>> lexical type
>> >
>> >  Payload payload;
>> >
>> >
>> > It almost feels like we are going down the road of Field, adding  
>> more
>> > and more to the base class instead of using some other mechanism  
>> like
>> > inheritance.
>>
>> So PayloadToken would be more inline with what you are thinking?
>> Then there becomes the need to do instanceof to determine when you
>> have payloads?
>
> I don't have a good answer for that one... a real inheritance solution
> would be invasive to the indexing code and probably not worth it at
> this point.  There is also the problem of mixing different (future)
> token properties... what you really want are mixins or something.
>
> At this point, just forget I brought it up ;-)
>
>> > A bigger problem, however, is that payloads will be lost by filters
>> > that aren't payload aware, and create new Tokens.  We had the same
>> > problem with position increments being lost.
>> >
>> > For this latter problem, I think the answer is to *not* create new
>> > tokens, and make all the properties of Token settable.
>>
>> This seems reasonable.  I never quite understood the need to create
>> new tokens.  The other option may be to use a copy constructor, but
>> again, that seems wasteful.
>
> We have clone() when new tokens need to be created (that's needed when
> filters create more tokens, like synonym injection, etc).  Since Token
> could be subclassed, that's probably the right approach.
>
> -Yonik
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org/tech/lucene.asp

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ 
LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message