lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <yo...@apache.org>
Subject Re: Payload API
Date Tue, 20 Nov 2007 00:43:27 GMT
On Nov 19, 2007 6:52 PM, Michael Busch <buschmic@gmail.com> wrote:
> Yonik Seeley wrote:
> >
> > So I think we all agree to do payloads by reference (do not make a
> > copy of byte[] like termBuffer does), and to allow payload reuse.
> >
> > So now we still have 3 viable options still on the table I think:
> > Token{ byte[] payload, int payloadLength, ...}
> > Token{ byte[] payload, int payloadOffset, int payloadLength,...}
> > Token{ Payload p, ... }
> >
>
> I'm for option 2. I agree that it is worthwhile to allow filters to
> modify the payloads. And I'd like to optimize for the case where lot's
> of tokens have payloads, and option 2 seems therefore the way to go.

Just to play devil's advocate, it seems like adding the byte[]
directly to Token gains less than we might have been thinking if we
have reuse in any case.  A TokenFilter could reuse the same Payload
object for each term in a Field, so the CPU allocation savings is
closer to a single Payload per field using payloads.

If we used a Payload object, it would save 8 bytes per Token for
fields not using payloads.
Besides an initial allocation per field, the additional cost to using
a Payload field would be an additional dereference (but that should be
really minor).

So I'm a bit more on-the-fence...
Thoughts?

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message