lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Index size and performance degradation
Date Tue, 14 Jun 2011 12:06:09 GMT
Sorry, wrong email ;)

Mike McCandless

http://blog.mikemccandless.com

On Tue, Jun 14, 2011 at 8:05 AM, Michael McCandless
<lucene@mikemccandless.com> wrote:
> Hmm, this sounds hairy :)
>
> Are you sure NRTCachingDir won't work for you?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Tue, Jun 14, 2011 at 5:58 AM, Ganesh <emailgane@yahoo.co.in> wrote:
>> Is it a bad idea to keep multiple shards in a single system?
>>
>> Regards
>> Ganesh
>>
>> ----- Original Message -----
>> From: "Toke Eskildsen" <te@statsbiblioteket.dk>
>> To: <java-user@lucene.apache.org>
>> Sent: Tuesday, June 14, 2011 12:58 PM
>> Subject: Re: Index size and performance degradation
>>
>>
>>> On Sun, 2011-06-12 at 10:10 +0200, Itamar Syn-Hershko wrote:
>>>> The whole point of my question was to find out if and how to make
>>>> balancing on the SAME machine. Apparently thats not going to help and at
>>>> a certain point we will just have to prompt the user to buy more hardware...
>>>
>>> It really depends on your scenario. If you have few concurrent requests
>>> and are looking to minimize latency, sharding might help; assuming you
>>> have fast IO and multiple cores. You basically want to saturate all
>>> available resources for all requests.
>>>
>>> On the other hand, if throughput is the issue, sharding on a single
>>> machine is counter-productive due to increased duplication and merging.
>>>
>>>> Out of curiosity, isn't there anything that we can do to avoid that? for
>>>> instance using memory-mapped files for the indexes? anything that would
>>>> help us overcome OS limitations of that sort...
>>>
>>> One standard advice for speeding up searches is using SSD's. Our
>>> (admittedly old) experiments puts SSD-performance near RAM. With the
>>> prices we have now, SSD's seems like an obvious choice for most setups.
>>>
>>> We tried a few performance tests at different index sizes and for us,
>>> index size vs. performance looked like the power law: Heavy performance
>>> degradation in the beginning, less later. It makes sense when we look at
>>> caching and it means that if you do not require stellar performance, you
>>> can have very large indexes on few machines (cue Hathi Trust).
>>>
>>> - Toke Eskildsen
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message