lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julien Nioche" <Julien.Nio...@lingway.com>
Subject Re: suggestion for a CustomDirectory
Date Fri, 05 Dec 2003 14:17:40 GMT
Thank you for your answer Doug

Profiling my application indicates that a lot of times is spent for the
creation of temporary Term objects.

This is at least true for PhraseQueries weighting as shown on the profiling
figures below :

.41.2% - 473240 ms - 2802 inv.
org.apache.lucene.search.PhraseQuery$PhraseWeight.scorer
..40.4% - 464202 ms - 7440 inv.
org.apache.lucene.index.IndexReader.termPositions
...40.1% - 460378 ms - 7440 inv.
org.apache.lucene.index.SegmentTermDocs.seek
....40.0% - 459297 ms - 7440 inv.
org.apache.lucene.index.TermInfosReader.get
.....39.1% - 448370 ms - 7440 inv.
org.apache.lucene.index.TermInfosReader.scanEnum
.......34.4% - 394578 ms - 484790 inv.
org.apache.lucene.index.SegmentTermEnum.next
.........25.8% - 296435 ms - 484790 inv.
org.apache.lucene.index.SegmentTermEnum.readTerm
.........3.5% - 40565 ms - 969580 inv.
org.apache.lucene.store.InputStream.readVLong
.........1.8% - 21147 ms - 484790 inv.
org.apache.lucene.store.InputStream.readVInt

This is only method time, it doesn't take into account the time required for
garbage collecting all those temporary objects.

I'll test other applications I made to confirm this.

>> Scott,

I tried NIODirectory and provided some benchmarks for it on the list with my
apps. It improves a little bit the overall performances but it could be
interesting if we could choose the files we want to map into memory.

----- Original Message -----
From: "Doug Cutting" <cutting@lucene.com>
To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
Sent: Thursday, December 04, 2003 7:28 PM
Subject: Re: suggestion for a CustomDirectory


> Julien Nioche wrote:
> > However in most cases the
> > application would be faster because :
> > - tree access to the Term (this is only the case for the Terms in the
.tii)
> > - no need to create up to 127 temporary Term objects (with creation of
> > Strings and so on....)
> > - limit garbage collecting
>
> The .tii is already read into memory when the index is opened.  So the
> only savings would be the creation of (on average) 64 temporary Term
> objects per query.  Do you have any evidence that this is a substantial
> part of the computation?  I'd be surprised if it was.  To find out, you
> could write a program which compares the time it takes to call docFreq()
> on a set of terms (allocating the 64 temporary Terms) to what it takes
> to perform queries (doing the rest of the work).  I'll bet that the
> first is substantially faster: most of the work of executing a query is
> processing the .frq and .prx files.  These are bigger than the RAM on
> your machine, and so cannot be cached.  Thus you'll always be doing some
> disk i/o, which will likely dominate real performance.
>
> Doug
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message