lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julien Nioche" <Julien.Nio...@lingway.com>
Subject Re: Optimizing for long queries? >> 40% faster by changing INDEX_INTERVAL
Date Thu, 01 Jul 2004 12:32:54 GMT
A similar experiment with 500 shorter queries shows a 20% speed improvement.
(see xls file for details)
By shorter query I mean something like that :
((titre:"burgundy wines"~3 titre:"burgundy wine"~3)) ((texte:"burgundy
wines"~3^3.0 texte:"burgundy wine"~3^3.0)) ((descr:"burgundy wines"~3^4.0
descr:"burgundy wine"~3^4.0)) ((kw:"burgundy wines"~3^4.0 kw:"burgundy
wine"~3^4.0))

----- Original Message ----- 

From: "Julien Nioche" <Julien.Nioche@lingway.com>
To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
Cc: <drew.farris@gmail.com>
Sent: Thursday, July 01, 2004 10:53 AM
Subject: Re: Optimizing for long queries? >> 40% faster by changing
INDEX_INTERVAL


> I got a little bit deeper in my experiments with INDEX_INTERVAL. In a
> previous mail to the user list I reported a 10% improvement over the
regular
> setting (128) with one of my application.
> I refined the measures by taking the time spent not in the whole
> application, but in a method that encapsulates Lucene searches. Only the
> search time is measured, not the access to the Documents.
>
> Two sets of queries are generated using a log of user queries from our
> application. Theses queries are in natural language and are expanded by
our
> product into a Lucene boolean query. Attached is the boolean generated for
> the query "Burgundy wine" - just to give you an idea of what I mean by
large
> query (this one is particularly big).
>
> These queries are used on an optimized index (INDEX_INTERVAL=16) and a
> regular index. The index used for this test is 720 MB - FSDirectory on
> Fedora 1 the .tii file is 3398 Kb in the modified version against 488Kb in
> the original. Both sets of queries have the same size (783). The xls file
> contains the times for both indexes sorted by decreasing order. Actually
the
> numbers indicates not a single search but a group of up to 4 searches.
>
> In average, changing the indexinterval to 16 yields an improvement of
about
> 40% compared to the regular setting.
> I will try with a bigger sample of 40.000 queries and with smaller queries
> as well.
>
> The original motivation for this feature can be found at
> http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg04092.html
>
> What is the best way to set up this value in IndexWriter? Maybe we could
> limit to a few possible values like :
> DEFAULT = 128
> AVERAGE = 64
> HIGH = 32
> in order to avoid too low settings.
>
> Any comments or suggestions? Can anyone give feedback on this?
>
> Julien
>
>
>
> ----- Original Message ----- 
> From: "Julien Nioche" <Julien.Nioche@lingway.com>
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Sent: Tuesday, June 29, 2004 3:03 PM
> Subject: Re: Optimizing for long queries?
>
>
> > I ran some tests changing TermInfosWriter.INDEX_INTERVAL to 16.
> > On my application (which does a lot on top of lucene - including SQL
> > transactions and so on) I won 10% percent time.
> > I suppose this could be a bigger improvements in other applications,
> because
> > the search with Lucene is not 100% of my application.
> >
> > The index used for this test is 720 MB - FSDirectory on Fedora 1
> > the .tii file is 3398 Kb in the modified version against 488Kb in the
> > original (INDEX_INTERVAL=128)
> >
> > Has anyone tried changing this value? Do you get similar results?
> >
> > Julien
> >
> > ----- Original Message ----- 
> > From: "Julien Nioche" <Julien.Nioche@lingway.com>
> > To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> > Sent: Monday, June 28, 2004 10:04 AM
> > Subject: Re: Optimizing for long queries?
> >
> >
> > > Hello Drew,
> > >
> > > I don't think it's in the FAQ.
> > >
> > > 1 - What you could do is to sort your query terms by ascending
> alphabetic
> > > order. In my case it improved a little bit the performance. It could
be
> > > interesting to know how it worked in your case.
> > >
> > > 2- Another solution is to play with TermInfosWriter.INDEX_INTERVAL at
> > > indexation time. I quote Doug :
> > >
> > > "..., try reducing TermInfosWriter.INDEX_INTERVAL.  You'll
> > > have to re-create your indexes each time you change this constant.
You
> > > might try a value like 16.  This would keep the number of terms in
> > > memory from being too huge (1 of 16 terms), but would reduce the
average
> > > number scanned from 64 to 8, which would be substantial.  Tell me how
> > > this works.  If it makes a big difference, then perhaps we should make
> > > this parameter more easily changeable."
> > >
> > > Have you used a profiler on your application? This could be useful to
> spot
> > > possible improvments.
> > >
> > >
> > > ----- Original Message ----- 
> > > From: "Drew Farris" <drew.farris@gmail.com>
> > > To: <lucene-user@jakarta.apache.org>
> > > Sent: Friday, June 25, 2004 8:24 PM
> > > Subject: Optimizing for long queries?
> > >
> > >
> > > > Apologies if this is a FAQ, but I didn't have much luck searching
the
> > > > list archives for answers on this subject:
> > > >
> > > > I'm using Lucene in a context where we have frequently have queries
> > > > that search for as many as 30-50 terms in a single field. Does
anyone
> > > > have any thoughts concerning ways optimize Lucene for queries of
these
> > > > lengths?
> > > >
> > >
> ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > > > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > > >
> > > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > >
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> >
>


----------------------------------------------------------------------------
----


> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message