lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Bowman <ebow...@boboco.ie>
Subject Re: Scaling out/up or a mix
Date Mon, 29 Jun 2009 06:53:27 GMT
There is no single answer -- this is always application specific.

Without knowing anything about what you are doing:

1. disk i/o is probably the most critical.  Go SSD or even RAM disk if
you can, if performance is absolutely critical
2. Sometimes CPU can become an issue, but 8 cores is probably enough
unless you are doing especially cpu-bound searches.

Unless you are doing something with hard performance requirements, or
really quite unusual, buying "good" kit is probably good enough, and you
won't really know for sure until you measure.  Lucene is a general
enough tool that there isn't a terribly universal answer to this.  We
were a bit surprised to end up cpu-bound instead of disk i/o-bound, for
instance, but we ended up taking an unusual path.  YMMV.

Marcus Herou wrote:
> Hi. I think I need to be more specific.
>
> What I am trying to find out is if I should aim for:
>
> CPU (2x4 cores, 2.0-3.0Ghz)? or perhaps just a 4 cores is enough.
> Fast disk IO: 8 disks, RAID1+0 ? or perhaps 2 disks is enough...
> RAM - if the index does not fit into RAM how much RAM should I then buy ?
>
> Please any hints would be appreciated since I am going to invest soon.
>
> //Marcus
>
> On Sat, Jun 27, 2009 at 12:00 AM, Marcus Herou
> <marcus.herou@tailsweep.com>wrote:
>
>   
>> Hi.
>>
>> I currently have an index which is 16GB per machine (8 machines = 128GB)
>> (data is stored externally, not in index) and is growing like crazy (we are
>> indexing blogs which is crazy by nature) and have only allocated 2GB per
>> machine to the Lucene app since we are running some other stuff there in
>> parallell.
>>
>> Each doc should be roughly the size of a blog post, no more than 20k.
>>
>> We currently have about 90M documents and it is increasing rapidly so
>> getting into the G+ document range is not going to be too far away.
>>
>> Now due to search performance I think I need to move these instances to
>> dedicated index/search machines (or index on some machines and search on
>> others). Anyway I would like to get some feedback about two things:
>>
>> 1. What is the most important hardware aspect when it comes to add document
>> to the index and optimize it.
>> 1.1 Is it disk I|O write throghput ? (sequential or random-io ?)
>> 1.2 Is it RAM ?
>> 1.3 Is is CPU ?
>>
>> My guess would be disk-io, right, wrong ?
>>
>> 2. What is the most important hardware aspect when it comes to searching
>> documents in my setup ? (result-set is limited to return only the top 10
>> matches with page handling)
>> 2.1 Is it disk read throughput ? (sequential or random-io ?)
>> 2.2 Is it RAM ?
>> 2.3 Is is CPU ?
>>
>> I have no clue since the data might not fit into memory. What is then the
>> most important factor ? read-performance while scanning the index ? CPU
>> while comparing fields and collecting results ?
>>
>> What I'm trying to find out is what I can do to get most bang for the buck
>> with a limited (aren't we all limited?) budget.
>>
>> Kindly
>>
>> //Marcus
>>
>>
>>
>>
>>
>> --
>> Marcus Herou CTO and co-founder Tailsweep AB
>> +46702561312
>> marcus.herou@tailsweep.com
>> http://www.tailsweep.com/
>>
>>
>>     
>
>
>   


-- 
Eric Bowman
Boboco Ltd
ebowman@boboco.ie
http://www.boboco.ie/ebowman/pubkey.pgp
+35318394189/+353872801532


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message