lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Wang" <john.w...@gmail.com>
Subject Re: payload performance wrt fieldcache
Date Thu, 03 Apr 2008 16:51:47 GMT
Apparently tp.nextPosition() is needed :(
Any ideas?

-John

On Thu, Apr 3, 2008 at 8:20 AM, John Wang <john.wang@gmail.com> wrote:

> I am loading both from disk.
> But I found the culprit:
>
> My code:
>
> while (tp.next())
>
>           {
>
>           //assert tp.doc() < maxDoc;
>
>           tp.nextPosition();          <-- this call is the problem
>
>           tp.getPayload(payloadBuffer, 0);
>
>           byter.load(_array, tp.doc(), payloadBuffer);
>
>       }
>
> The way I stored it, there is one position per doc. Removed call to
> tp.nextPosition, performance improved by a factor of multiple digits.
>
> I would think this call should be free.
>
>
>
> Thanks
>
> -John
>
> On Thu, Apr 3, 2008 at 8:16 AM, Chris Lu <chris.lu@gmail.com> wrote:
>
> > If your index size grows larger, payload method would be more slower.
> > It's because Payload are read from hard disk. Fieldcache is in the
> > memory, which is much faster.
> >
> > Unless you are going with Solid State Disk, you'd better go with
> > Fieldcache for faster search.
> >
> > --
> > Chris Lu
> > -------------------------
> > Instant Scalable Full-Text Search On Any Database/Application
> > site: http://www.dbsight.net
> > demo: http://search.dbsight.com
> > Lucene Database Search in 3 minutes:
> >
> > http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
> > DBSight customer, a shopping comparison site, (anonymous per request)
> > got 2.6 Million Euro funding!
> >
> >
> > On Thu, Apr 3, 2008 at 7:36 AM, John Wang <john.wang@gmail.com> wrote:
> > > Sorry, gmail was screwy and accidentally sent the msg.
> > >  Anyway,
> > >
> > >  I have a large index, about 30M docs.
> > >  I have a date field (by days) and there are about 1000 of them, every
> > doc
> > >  has a date field filled in.
> > >
> > >  So out of curiosity I index the date field two ways:
> > >  1) using "date" as a field, and set the date value for each doc.
> > >  2) new term: "_payload:_val" and added the date (as a long or 8 byte
> > array)
> > >  into the payload of each doc.
> > >
> > >  loading into an array long[] of length maxdoc of dates, the
> > performance was
> > >  surprising:
> > >  using payload is 7 times slower than using fieldcache.
> > >
> > >  At first I thought it was because of the conversion between byte[8]
> > to a
> > >  long for each doc, I changed it so it loads into byte[8*maxdoc]
> > without
> > >  doing the conversion, and the result is the same.
> > >
> > >  I then did another experiment:
> > >  lower the number of dates down to a small number, e.g. 50, and timed
> > field
> > >  cache load, and it took much longer than when it had 1000.
> > >
> > >  I did some profiling and the profiler is pointing to
> > TermPositions.next
> > >  and TermPositions.nextPosition and TermPositions.getPayload as the
> > culprit.
> > >
> > >  I would think payload would always be faster. Any ideas?
> > >
> > >  Thanks
> > >  -John
> > >
> > >  On Thu, Apr 3, 2008 at 7:27 AM, John Wang <john.wang@gmail.com>
> > wrote:
> > >
> > >  > Hi:
> > >  >
> > >  >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message