lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Busch <busch...@gmail.com>
Subject Re: Payload API
Date Tue, 20 Nov 2007 18:06:24 GMT
Michael McCandless wrote:
> "Yonik Seeley" <yonik@apache.org> wrote:
>> On Nov 19, 2007 6:52 PM, Michael Busch <buschmic@gmail.com> wrote:
>>> Yonik Seeley wrote:
>>>> So I think we all agree to do payloads by reference (do not make a
>>>> copy of byte[] like termBuffer does), and to allow payload reuse.
>>>>
>>>> So now we still have 3 viable options still on the table I think:
>>>> Token{ byte[] payload, int payloadLength, ...}
>>>> Token{ byte[] payload, int payloadOffset, int payloadLength,...}
>>>> Token{ Payload p, ... }
>>>>
>>> I'm for option 2. I agree that it is worthwhile to allow filters to
>>> modify the payloads. And I'd like to optimize for the case where lot's
>>> of tokens have payloads, and option 2 seems therefore the way to go.
>> Just to play devil's advocate, it seems like adding the byte[]
>> directly to Token gains less than we might have been thinking if we
>> have reuse in any case.  A TokenFilter could reuse the same Payload
>> object for each term in a Field, so the CPU allocation savings is
>> closer to a single Payload per field using payloads.
>>
>> If we used a Payload object, it would save 8 bytes per Token for
>> fields not using payloads.
>> Besides an initial allocation per field, the additional cost to using
>> a Payload field would be an additional dereference (but that should be
>> really minor).
> 
> These are excellent points.  I guess I would lean [back] towards
> keeping the separate Payload object and extending its API to allow
> re-use and modification of its byte[]?
> 

+1

-Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message