lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Wang" <john.w...@gmail.com>
Subject Re: Per-document Payloads
Date Tue, 23 Oct 2007 02:35:02 GMT
Hi Micahel:
    After removing isDelete(), the index loads in 430 ms.

Thanks

-john

On 10/21/07, Michael Busch <buschmic@gmail.com> wrote:
>
> John Wang wrote:
>
> >
> > Since all three methods loads docids into an int[], the lookup time is
> the
> > same for all three methods, what's
> > different are the load times:
> >
> > 1) 16.5 seconds,      43 MB
> > 2) 590 milliseconds     32.5 MB
> > 3) 186 milliseconds  26MB
>
> Good analysis! Thanks for sharing the results...
>
> >
> > I think the payload method is good enough so we don't need to diverge
> from
> > the lucene code base.
>
> Actually, I noticed that in my program in getCachedIDs() you can remove
> the check
>   if (!reader.isDeleted(tp.doc())) {
>
> This should improve the performance further (not sure how much though),
> because the synchronized isDeleted() call is quite expensive and not
> necessary.
>
> If you want to reduce the index size, you might want to try to encode
> the Integers more efficiently, e. g. as VInts (depending on the values
> of your UIDs).
>
> > However, I feel that being able to customize the
> > indexing process and store our own file is still more efficient both in
> load
> > time and index size.
> >
>
> Yes, the current payload implementation is not optimized for this use
> case, it can be improved with a per-doc approach like the one I suggested.
>
> -Michael
>
>
> > Thanks
> >
> > -John
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message