lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcus Herou <marcus.he...@tailsweep.com>
Subject Re: Scaling out/up or a mix
Date Sun, 28 Jun 2009 21:13:12 GMT
Hi. I think I need to be more specific.

What I am trying to find out is if I should aim for:

CPU (2x4 cores, 2.0-3.0Ghz)? or perhaps just a 4 cores is enough.
Fast disk IO: 8 disks, RAID1+0 ? or perhaps 2 disks is enough...
RAM - if the index does not fit into RAM how much RAM should I then buy ?

Please any hints would be appreciated since I am going to invest soon.

//Marcus

On Sat, Jun 27, 2009 at 12:00 AM, Marcus Herou
<marcus.herou@tailsweep.com>wrote:

> Hi.
>
> I currently have an index which is 16GB per machine (8 machines = 128GB)
> (data is stored externally, not in index) and is growing like crazy (we are
> indexing blogs which is crazy by nature) and have only allocated 2GB per
> machine to the Lucene app since we are running some other stuff there in
> parallell.
>
> Each doc should be roughly the size of a blog post, no more than 20k.
>
> We currently have about 90M documents and it is increasing rapidly so
> getting into the G+ document range is not going to be too far away.
>
> Now due to search performance I think I need to move these instances to
> dedicated index/search machines (or index on some machines and search on
> others). Anyway I would like to get some feedback about two things:
>
> 1. What is the most important hardware aspect when it comes to add document
> to the index and optimize it.
> 1.1 Is it disk I|O write throghput ? (sequential or random-io ?)
> 1.2 Is it RAM ?
> 1.3 Is is CPU ?
>
> My guess would be disk-io, right, wrong ?
>
> 2. What is the most important hardware aspect when it comes to searching
> documents in my setup ? (result-set is limited to return only the top 10
> matches with page handling)
> 2.1 Is it disk read throughput ? (sequential or random-io ?)
> 2.2 Is it RAM ?
> 2.3 Is is CPU ?
>
> I have no clue since the data might not fit into memory. What is then the
> most important factor ? read-performance while scanning the index ? CPU
> while comparing fields and collecting results ?
>
> What I'm trying to find out is what I can do to get most bang for the buck
> with a limited (aren't we all limited?) budget.
>
> Kindly
>
> //Marcus
>
>
>
>
>
> --
> Marcus Herou CTO and co-founder Tailsweep AB
> +46702561312
> marcus.herou@tailsweep.com
> http://www.tailsweep.com/
>
>


-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.herou@tailsweep.com
http://www.tailsweep.com/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message