lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: index file size threshold affecting search performance?
Date Wed, 28 Mar 2007 23:02:12 GMT
Well, if you're adding data, you must be doing something with
it later. Are you sure the problem is in the index growing and not
how you use the data afterwards?

The reason I ask is that I found almost a 10-fold increase in
my apps performance when I used FieldSelector (Lucene 2.1)
to only load the important fields in my documents. See the
thread labeled

Lucene 2.1, using FieldSelector speeds up my app by a factor of 10+,
numbers attached

A quick test you could do is to time the actual search as
opposed to the overall response time. That is, just time
the call to Searcher.search, and collect the time you spend,
say, assembling whatever you do to respond separately to
see whether the search is killing you or your manipulation
after the search.

The take-away is that we're all surprised by the difference you're
seeing and I'm betting that it's something other than merely
adding 5% to your index size. What is left as an exercise for the
reader <G>.

Best
Erick


On 3/28/07, Oshima, Scott <soshima@business.com> wrote:
>
> Yeah it might be an hardware issue, with a slightly smaller index with
> less stored data, the performance is what we want it to be.  Just adding
> 5% more stored data(unidexed of course) pushes us over some sort of
> threshold causing performance to tank.
>
>
>
> -----Original Message-----
> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
> Sent: Wednesday, March 28, 2007 12:46 PM
> To: java-user@lucene.apache.org
> Subject: Re: index file size threshold affecting search performance?
>
> I've just built a 9.3G index (admittedly tons of stored data in there,
> 3.3M documents) and performance is amazing (through Solr).
>
>         Erik
>
>
>
> On Mar 28, 2007, at 3:11 PM, Erick Erickson wrote:
>
> > This surprises me, I'm currently working with a 4G index, and the
> > improvement from when it was an 8G index was only 10% or so.
> > And it's plenty speedy.
> >
> > Are you hitting hardware limitations and perhaps swapping like crazy?
> > In which case, unless you split things across several machines, I
> > doubt it would help to make two smaller indexes.
> >
> > In sum, I really suspect that you're NOT hitting a Lucene limitation,
> > but it's something else about your system....
> >
> > Best
> > Erick
> >
> > On 3/28/07, Scott Oshima <soshima@gmail.com> wrote:
> >>
> >> So I assumed a linear decay of performance as an index got bigger.
> >>
> >> For some reason when going from an index size of 1.89 to 1.95 gigs
> >> dramatically increased cpu across all of our servers.
> >>
> >> I was thinking of splitting the 1.95 index into 2 separate indexes
> >> and using a multisearcher on those parts?
> >>
> >> thanks.
> >>
> >> -scott
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message