lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James" <ja...@ryley.com>
Subject RE: Infrastructure for large Lucene index
Date Fri, 06 Oct 2006 20:05:17 GMT
Hi Slava,

We currently do this across many machines for
http://www.FreePatentsOnline.com.  Our indexes are, in aggregate across our
various collections, even larger than you need.  We use Remote
ParalellMultiSearcher, with some custom modifications (and we are in the
process of making more) to allow most robust handling of many processes at
once and integration of the responses from various sub-indexes.  This works
fine on commodity hardware, and you will be IO bound, so get multiple drives
in each machine.

Out of curiosity, what project are you working on?  That's a lot of hits!

Sincerely,
James Ryley, Ph.D.
www.FreePatentsOnline.com


> -----Original Message-----
> From: Slava Imeshev [mailto:imeshev@yahoo.com]
> Sent: Friday, October 06, 2006 2:28 PM
> To: general@lucene.apache.org
> Subject: Infrastructure for large Lucene index
> 
> 
> I am dealing with pretty challenging task, so I thought it would be
> a good idea to ask community before I re-invent any wheels of my own.
> 
> I have a Lucene index that is going to grow to 100GB soon. This is
> index going to be read very aggresively (10s of millions  requests
> per day) with some occasional updates (10 batches per day).
> 
> The idea is to split load between multiple server nodes running Lucene
> on *nix while accessing the same index that is shared across the network.
> 
> I am wondering if it's a good idea and/or if there are any recommendations
> regarding selecting/tweaking network configuration (software+hardware)
> for an index of this size.
> 
> Thank you.
> 
> Slava Imeshev


Mime
View raw message