lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James" <ja...@ryley.com>
Subject RE: Infrastructure for large Lucene index
Date Fri, 06 Oct 2006 21:01:47 GMT
You don't separate the index by having "cat" in one and "dog" in another.
You separate it by document, so that both indexes have cat and dog, but the
indexes are smaller, meaning that response time is greatly increased.

> -----Original Message-----
> From: Slava Imeshev [mailto:imeshev@yahoo.com]
> Sent: Friday, October 06, 2006 4:59 PM
> To: general@lucene.apache.org
> Subject: RE: Infrastructure for large Lucene index
> 
> --- James <james@ryley.com> wrote:
> > I may have misinterpreted your email in my initial response.  Are you
> saying
> > you want nodes (presumably for more CPUs) that all access the same
> shared
> > index (on Network Attached Storage, presumably)?
> 
> Yes, that's right.
> 
> > If so, I think you are going to have read and write performance issues
> > unless you are using some SERIOUS storage system.
> 
> Yes, that's what I am trying to figure out, how serious it should be.
> 
> > If you aren't already
> > committed to the hardware configuration you seem to be describing,
> 
> I am not.
> 
> > I would go with commodity hardware and split the indexes across each
> machine -- data
> > locality is going to be important.
> 
> This is understood, but that is not going to work for searching for "cat
> dog"
> when the "cat" is in one index and the "dog" in another.
> 
> Slava
> 
> >
> > Sincerely,
> > James Ryley, Ph.D.
> > www.FreePatentsOnline.com
> >
> >
> > > -----Original Message-----
> > > From: Slava Imeshev [mailto:imeshev@yahoo.com]
> > > Sent: Friday, October 06, 2006 2:28 PM
> > > To: general@lucene.apache.org
> > > Subject: Infrastructure for large Lucene index
> > >
> > >
> > > I am dealing with pretty challenging task, so I thought it would be
> > > a good idea to ask community before I re-invent any wheels of my own.
> > >
> > > I have a Lucene index that is going to grow to 100GB soon. This is
> > > index going to be read very aggresively (10s of millions  requests
> > > per day) with some occasional updates (10 batches per day).
> > >
> > > The idea is to split load between multiple server nodes running Lucene
> > > on *nix while accessing the same index that is shared across the
> network.
> > >
> > > I am wondering if it's a good idea and/or if there are any
> recommendations
> > > regarding selecting/tweaking network configuration (software+hardware)
> > > for an index of this size.
> > >
> > > Thank you.
> > >
> > > Slava Imeshev
> >
> >
> >


Mime
View raw message