lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julien Nioche" <Julien.Nio...@lingway.com>
Subject Re: Optimizing for long queries? >> 40% faster by changing INDEX_INTERVAL
Date Thu, 01 Jul 2004 12:38:02 GMT
The xls files did not pass. You can download them from the following URLs :
http://jnioche.freesurf.fr/shortQueries.xls
http://jnioche.freesurf.fr/longQueries.xls

----- Original Message ----- 
From: "Julien Nioche" <Julien.Nioche@lingway.com>
To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
Sent: Thursday, July 01, 2004 2:32 PM
Subject: Re: Optimizing for long queries? >> 40% faster by changing
INDEX_INTERVAL


> A similar experiment with 500 shorter queries shows a 20% speed
improvement.
> (see xls file for details)
> By shorter query I mean something like that :
> ((titre:"burgundy wines"~3 titre:"burgundy wine"~3)) ((texte:"burgundy
> wines"~3^3.0 texte:"burgundy wine"~3^3.0)) ((descr:"burgundy wines"~3^4.0
> descr:"burgundy wine"~3^4.0)) ((kw:"burgundy wines"~3^4.0 kw:"burgundy
> wine"~3^4.0))
>
> ----- Original Message ----- 
>
> From: "Julien Nioche" <Julien.Nioche@lingway.com>
> To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
> Cc: <drew.farris@gmail.com>
> Sent: Thursday, July 01, 2004 10:53 AM
> Subject: Re: Optimizing for long queries? >> 40% faster by changing
> INDEX_INTERVAL
>
>
> > I got a little bit deeper in my experiments with INDEX_INTERVAL. In a
> > previous mail to the user list I reported a 10% improvement over the
> regular
> > setting (128) with one of my application.
> > I refined the measures by taking the time spent not in the whole
> > application, but in a method that encapsulates Lucene searches. Only the
> > search time is measured, not the access to the Documents.
> >
> > Two sets of queries are generated using a log of user queries from our
> > application. Theses queries are in natural language and are expanded by
> our
> > product into a Lucene boolean query. Attached is the boolean generated
for
> > the query "Burgundy wine" - just to give you an idea of what I mean by
> large
> > query (this one is particularly big).
> >
> > These queries are used on an optimized index (INDEX_INTERVAL=16) and a
> > regular index. The index used for this test is 720 MB - FSDirectory on
> > Fedora 1 the .tii file is 3398 Kb in the modified version against 488Kb
in
> > the original. Both sets of queries have the same size (783). The xls
file
> > contains the times for both indexes sorted by decreasing order. Actually
> the
> > numbers indicates not a single search but a group of up to 4 searches.
> >
> > In average, changing the indexinterval to 16 yields an improvement of
> about
> > 40% compared to the regular setting.
> > I will try with a bigger sample of 40.000 queries and with smaller
queries
> > as well.
> >
> > The original motivation for this feature can be found at
> > http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg04092.html
> >
> > What is the best way to set up this value in IndexWriter? Maybe we could
> > limit to a few possible values like :
> > DEFAULT = 128
> > AVERAGE = 64
> > HIGH = 32
> > in order to avoid too low settings.
> >
> > Any comments or suggestions? Can anyone give feedback on this?
> >
> > Julien
> >
> >
> >
> > ----- Original Message ----- 
> > From: "Julien Nioche" <Julien.Nioche@lingway.com>
> > To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> > Sent: Tuesday, June 29, 2004 3:03 PM
> > Subject: Re: Optimizing for long queries?
> >
> >
> > > I ran some tests changing TermInfosWriter.INDEX_INTERVAL to 16.
> > > On my application (which does a lot on top of lucene - including SQL
> > > transactions and so on) I won 10% percent time.
> > > I suppose this could be a bigger improvements in other applications,
> > because
> > > the search with Lucene is not 100% of my application.
> > >
> > > The index used for this test is 720 MB - FSDirectory on Fedora 1
> > > the .tii file is 3398 Kb in the modified version against 488Kb in the
> > > original (INDEX_INTERVAL=128)
> > >
> > > Has anyone tried changing this value? Do you get similar results?
> > >
> > > Julien
> > >
> > > ----- Original Message ----- 
> > > From: "Julien Nioche" <Julien.Nioche@lingway.com>
> > > To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> > > Sent: Monday, June 28, 2004 10:04 AM
> > > Subject: Re: Optimizing for long queries?
> > >
> > >
> > > > Hello Drew,
> > > >
> > > > I don't think it's in the FAQ.
> > > >
> > > > 1 - What you could do is to sort your query terms by ascending
> > alphabetic
> > > > order. In my case it improved a little bit the performance. It could
> be
> > > > interesting to know how it worked in your case.
> > > >
> > > > 2- Another solution is to play with TermInfosWriter.INDEX_INTERVAL
at
> > > > indexation time. I quote Doug :
> > > >
> > > > "..., try reducing TermInfosWriter.INDEX_INTERVAL.  You'll
> > > > have to re-create your indexes each time you change this constant.
> You
> > > > might try a value like 16.  This would keep the number of terms in
> > > > memory from being too huge (1 of 16 terms), but would reduce the
> average
> > > > number scanned from 64 to 8, which would be substantial.  Tell me
how
> > > > this works.  If it makes a big difference, then perhaps we should
make
> > > > this parameter more easily changeable."
> > > >
> > > > Have you used a profiler on your application? This could be useful
to
> > spot
> > > > possible improvments.
> > > >
> > > >
> > > > ----- Original Message ----- 
> > > > From: "Drew Farris" <drew.farris@gmail.com>
> > > > To: <lucene-user@jakarta.apache.org>
> > > > Sent: Friday, June 25, 2004 8:24 PM
> > > > Subject: Optimizing for long queries?
> > > >
> > > >
> > > > > Apologies if this is a FAQ, but I didn't have much luck searching
> the
> > > > > list archives for answers on this subject:
> > > > >
> > > > > I'm using Lucene in a context where we have frequently have
queries
> > > > > that search for as many as 30-50 terms in a single field. Does
> anyone
> > > > > have any thoughts concerning ways optimize Lucene for queries of
> these
> > > > > lengths?
> > > > >
> > > >
> > ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > > > > For additional commands, e-mail:
lucene-user-help@jakarta.apache.org
> > > > >
> > > > >
> > > >
> > > >
> > >
> ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > > > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > > >
> > > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > >
> > >
> >
>
>
> --------------------------------------------------------------------------
--
> ----
>
>
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>


----------------------------------------------------------------------------
----


> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message