lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <markharw...@yahoo.co.uk>
Subject Re: Index size and performance degradation
Date Tue, 14 Jun 2011 09:02:20 GMT
Partitioning and replication are the keys to handling data and user volumes 
respectively. 
However, this approach introduces some other concerns over consistency and 
availability of content which I've tried to capture 
here: http://www.slideshare.net/MarkHarwood/patterns-for-large-scale-search
These consistency concerns may not be an issue for you but I know they are for 
some organisations.
Many organisations want everything (large data, many users, fast searches, quick 
updates and always-consistent views of the very latest content) and the above 
slide-deck tries to outline why this is hard/impossible and the necessary 
trade-offs in a system's qualities of service. I'd be interested in maintaining 
this with any other suggestions the community have to offer so that we can use 
it to explain the qualities of any particular engine/configuration and the 
justifications for that design choice.

Cheers
Mark


----- Original Message ----
From: Itamar Syn-Hershko <itamar@code972.com>
To: java-user@lucene.apache.org
Sent: Tue, 14 June, 2011 9:03:15
Subject: Re: Index size and performance degradation

Thanks. Our product is pretty generic and we can't assume much on the 
hardware, as well as on usage. Some users would want low latency, others 
will prefer throughput. My job is to make as little compromise as 
possible...


As for SSD, thats generally a good advice, except they seem to be 
failing quite a lot. For example see: 
http://www.codinghorror.com/blog/2011/05/the-hot-crazy-solid-state-drive-scale.html



On 14/06/2011 10:28, Toke Eskildsen wrote:

> On Sun, 2011-06-12 at 10:10 +0200, Itamar Syn-Hershko wrote:
>> The whole point of my question was to find out if and how to make
>> balancing on the SAME machine. Apparently thats not going to help and at
>> a certain point we will just have to prompt the user to buy more hardware...
> It really depends on your scenario. If you have few concurrent requests
> and are looking to minimize latency, sharding might help; assuming you
> have fast IO and multiple cores. You basically want to saturate all
> available resources for all requests.
>
> On the other hand, if throughput is the issue, sharding on a single
> machine is counter-productive due to increased duplication and merging.
>
>> Out of curiosity, isn't there anything that we can do to avoid that? for
>> instance using memory-mapped files for the indexes? anything that would
>> help us overcome OS limitations of that sort...
> One standard advice for speeding up searches is using SSD's. Our
> (admittedly old) experiments puts SSD-performance near RAM. With the
> prices we have now, SSD's seems like an obvious choice for most setups.
>
> We tried a few performance tests at different index sizes and for us,
> index size vs. performance looked like the power law: Heavy performance
> degradation in the beginning, less later. It makes sense when we look at
> caching and it means that if you do not require stellar performance, you
> can have very large indexes on few machines (cue Hathi Trust).
>
> - Toke Eskildsen
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message