lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From markharw00d <markharw...@yahoo.co.uk>
Subject Re: scalability recommendations for large performance-intensive indexes
Date Wed, 08 Feb 2006 22:04:00 GMT
Hi Vince, sounds like the same issue I highlighted recently on the 
java-dev list.

See here:
http://www.nabble.com/Preventing-%22killer%22-queries-t1077895.html

The problem lies in the underlying cost of reading TermDocs for very 
common terms (a problem for  both queries and filters)

For your issue, given the  problem field only has 2 values you can 
comfortably cache 2 Bitsets and use them as filters.
This would only require roughly (35m/8)*2 bytes or about 9 meg of RAM.

Cheers
Mark



Vince Taluskie wrote:

>hello All,
>
>I'm looking for some advice on how to improve scalability - we have a fairly
>large lucene index of 35M documents, max 1k document size (most much
>smaller) and 14 fields.   We combine descriptive text together into a
>"contents" field and search on that and have been very pleased with handling
>almost 100 queries/sec at about 8-12ms for the average search.
>
>Prior to that we had a common attribute for which about 50% of the docs had
>one value and the rest had the other value and the boolean query slowed
>response times very significantly.  We handled this by breaking up our
>indexes so that the index only contained one attribute or the other and
>eliminated the need for the boolean - this was a 7-8x improvement.
>
>Now we're back to wanting to add another attribute to the documents for
>which most of the docs will have one value and much fewer will have the
>other and although it sounds so simple - my limited testing with an 85/15
>ratio is showing another big hit on performance with the boolean.    A two
>term boolean search without the attribute is about 7-8ms, adding the
>attribute to the boolean search increases the elapsed time to 4x and 2x of
>original for the 85% and 15% frequencies respective.
>
>I had some hope that a QueryFilter would really help out but it turns out to
>be much much slower:  the 85% term ends up taking a whopping 336ms and the
>15% term ends up around 65ms which is 40x and 8x slower than the original
>8ms query speed without the additional attribute.
>
>I have to ask if there's not a better way to handle the addition of an
>common attribute with a few possible values across the index.  Any other
>recommended approaches?
>
>Thanks in advance,
>
>Vince
>
>  
>



		
___________________________________________________________ 
To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre.
http://uk.security.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message