lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Wang" <john.w...@gmail.com>
Subject Re: payload performance wrt fieldcache
Date Thu, 03 Apr 2008 14:36:42 GMT
Sorry, gmail was screwy and accidentally sent the msg.
Anyway,

I have a large index, about 30M docs.
I have a date field (by days) and there are about 1000 of them, every doc
has a date field filled in.

So out of curiosity I index the date field two ways:
1) using "date" as a field, and set the date value for each doc.
2) new term: "_payload:_val" and added the date (as a long or 8 byte array)
into the payload of each doc.

loading into an array long[] of length maxdoc of dates, the performance was
surprising:
using payload is 7 times slower than using fieldcache.

At first I thought it was because of the conversion between byte[8] to a
long for each doc, I changed it so it loads into byte[8*maxdoc] without
doing the conversion, and the result is the same.

I then did another experiment:
lower the number of dates down to a small number, e.g. 50, and timed field
cache load, and it took much longer than when it had 1000.

I did some profiling and the profiler is pointing to TermPositions.next
and TermPositions.nextPosition and TermPositions.getPayload as the culprit.

I would think payload would always be faster. Any ideas?

Thanks
-John

On Thu, Apr 3, 2008 at 7:27 AM, John Wang <john.wang@gmail.com> wrote:

> Hi:
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message