lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vince Taluskie <vgtalus...@gmail.com>
Subject scalability recommendations for large performance-intensive indexes
Date Wed, 08 Feb 2006 21:10:17 GMT
hello All,

I'm looking for some advice on how to improve scalability - we have a fairly
large lucene index of 35M documents, max 1k document size (most much
smaller) and 14 fields.   We combine descriptive text together into a
"contents" field and search on that and have been very pleased with handling
almost 100 queries/sec at about 8-12ms for the average search.

Prior to that we had a common attribute for which about 50% of the docs had
one value and the rest had the other value and the boolean query slowed
response times very significantly.  We handled this by breaking up our
indexes so that the index only contained one attribute or the other and
eliminated the need for the boolean - this was a 7-8x improvement.

Now we're back to wanting to add another attribute to the documents for
which most of the docs will have one value and much fewer will have the
other and although it sounds so simple - my limited testing with an 85/15
ratio is showing another big hit on performance with the boolean.    A two
term boolean search without the attribute is about 7-8ms, adding the
attribute to the boolean search increases the elapsed time to 4x and 2x of
original for the 85% and 15% frequencies respective.

I had some hope that a QueryFilter would really help out but it turns out to
be much much slower:  the 85% term ends up taking a whopping 336ms and the
15% term ends up around 65ms which is 40x and 8x slower than the original
8ms query speed without the additional attribute.

I have to ask if there's not a better way to handle the addition of an
common attribute with a few possible values across the index.  Any other
recommended approaches?

Thanks in advance,

Vince

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message