lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julien Nioche" <Julien.Nio...@lingway.com>
Subject Re: Optimizing for long queries? >> 40% faster by changing INDEX_INTERVAL
Date Thu, 01 Jul 2004 08:53:13 GMT
I got a little bit deeper in my experiments with INDEX_INTERVAL. In a
previous mail to the user list I reported a 10% improvement over the regular
setting (128) with one of my application.
I refined the measures by taking the time spent not in the whole
application, but in a method that encapsulates Lucene searches. Only the
search time is measured, not the access to the Documents.

Two sets of queries are generated using a log of user queries from our
application. Theses queries are in natural language and are expanded by our
product into a Lucene boolean query. Attached is the boolean generated for
the query "Burgundy wine" - just to give you an idea of what I mean by large
query (this one is particularly big).

These queries are used on an optimized index (INDEX_INTERVAL=16) and a
regular index. The index used for this test is 720 MB - FSDirectory on
Fedora 1 the .tii file is 3398 Kb in the modified version against 488Kb in
the original. Both sets of queries have the same size (783). The xls file
contains the times for both indexes sorted by decreasing order. Actually the
numbers indicates not a single search but a group of up to 4 searches.

In average, changing the indexinterval to 16 yields an improvement of about
40% compared to the regular setting.
I will try with a bigger sample of 40.000 queries and with smaller queries
as well.

The original motivation for this feature can be found at
http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg04092.html

What is the best way to set up this value in IndexWriter? Maybe we could
limit to a few possible values like :
DEFAULT = 128
AVERAGE = 64
HIGH = 32
in order to avoid too low settings.

Any comments or suggestions? Can anyone give feedback on this?

Julien



----- Original Message ----- 
From: "Julien Nioche" <Julien.Nioche@lingway.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Tuesday, June 29, 2004 3:03 PM
Subject: Re: Optimizing for long queries?


> I ran some tests changing TermInfosWriter.INDEX_INTERVAL to 16.
> On my application (which does a lot on top of lucene - including SQL
> transactions and so on) I won 10% percent time.
> I suppose this could be a bigger improvements in other applications,
because
> the search with Lucene is not 100% of my application.
>
> The index used for this test is 720 MB - FSDirectory on Fedora 1
> the .tii file is 3398 Kb in the modified version against 488Kb in the
> original (INDEX_INTERVAL=128)
>
> Has anyone tried changing this value? Do you get similar results?
>
> Julien
>
> ----- Original Message ----- 
> From: "Julien Nioche" <Julien.Nioche@lingway.com>
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Sent: Monday, June 28, 2004 10:04 AM
> Subject: Re: Optimizing for long queries?
>
>
> > Hello Drew,
> >
> > I don't think it's in the FAQ.
> >
> > 1 - What you could do is to sort your query terms by ascending
alphabetic
> > order. In my case it improved a little bit the performance. It could be
> > interesting to know how it worked in your case.
> >
> > 2- Another solution is to play with TermInfosWriter.INDEX_INTERVAL at
> > indexation time. I quote Doug :
> >
> > "..., try reducing TermInfosWriter.INDEX_INTERVAL.  You'll
> > have to re-create your indexes each time you change this constant.  You
> > might try a value like 16.  This would keep the number of terms in
> > memory from being too huge (1 of 16 terms), but would reduce the average
> > number scanned from 64 to 8, which would be substantial.  Tell me how
> > this works.  If it makes a big difference, then perhaps we should make
> > this parameter more easily changeable."
> >
> > Have you used a profiler on your application? This could be useful to
spot
> > possible improvments.
> >
> >
> > ----- Original Message ----- 
> > From: "Drew Farris" <drew.farris@gmail.com>
> > To: <lucene-user@jakarta.apache.org>
> > Sent: Friday, June 25, 2004 8:24 PM
> > Subject: Optimizing for long queries?
> >
> >
> > > Apologies if this is a FAQ, but I didn't have much luck searching the
> > > list archives for answers on this subject:
> > >
> > > I'm using Lucene in a context where we have frequently have queries
> > > that search for as many as 30-50 terms in a single field. Does anyone
> > > have any thoughts concerning ways optimize Lucene for queries of these
> > > lengths?
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > >
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>

Mime
View raw message