lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Slava Imeshev <imes...@yahoo.com>
Subject RE: Infrastructure for large Lucene index
Date Fri, 06 Oct 2006 21:08:01 GMT
--- James <james@ryley.com> wrote:
> You don't separate the index by having "cat" in one and "dog" in another.
> You separate it by document, so that both indexes have cat and dog, but the
> indexes are smaller, meaning that response time is greatly increased.

I think I oversimplified the problem with this example.

Slava

> 
> > -----Original Message-----
> > From: Slava Imeshev [mailto:imeshev@yahoo.com]
> > Sent: Friday, October 06, 2006 4:59 PM
> > To: general@lucene.apache.org
> > Subject: RE: Infrastructure for large Lucene index
> > 
> > --- James <james@ryley.com> wrote:
> > > I may have misinterpreted your email in my initial response.  Are you
> > saying
> > > you want nodes (presumably for more CPUs) that all access the same
> > shared
> > > index (on Network Attached Storage, presumably)?
> > 
> > Yes, that's right.
> > 
> > > If so, I think you are going to have read and write performance issues
> > > unless you are using some SERIOUS storage system.
> > 
> > Yes, that's what I am trying to figure out, how serious it should be.
> > 
> > > If you aren't already
> > > committed to the hardware configuration you seem to be describing,
> > 
> > I am not.
> > 
> > > I would go with commodity hardware and split the indexes across each
> > machine -- data
> > > locality is going to be important.
> > 
> > This is understood, but that is not going to work for searching for "cat
> > dog"
> > when the "cat" is in one index and the "dog" in another.
> > 
> > Slava
> > 
> > >
> > > Sincerely,
> > > James Ryley, Ph.D.
> > > www.FreePatentsOnline.com
> > >
> > >
> > > > -----Original Message-----
> > > > From: Slava Imeshev [mailto:imeshev@yahoo.com]
> > > > Sent: Friday, October 06, 2006 2:28 PM
> > > > To: general@lucene.apache.org
> > > > Subject: Infrastructure for large Lucene index
> > > >
> > > >
> > > > I am dealing with pretty challenging task, so I thought it would be
> > > > a good idea to ask community before I re-invent any wheels of my own.
> > > >
> > > > I have a Lucene index that is going to grow to 100GB soon. This is
> > > > index going to be read very aggresively (10s of millions  requests
> > > > per day) with some occasional updates (10 batches per day).
> > > >
> > > > The idea is to split load between multiple server nodes running Lucene
> > > > on *nix while accessing the same index that is shared across the
> > network.
> > > >
> > > > I am wondering if it's a good idea and/or if there are any
> > recommendations
> > > > regarding selecting/tweaking network configuration (software+hardware)
> > > > for an index of this size.
> > > >
> > > > Thank you.
> > > >
> > > > Slava Imeshev
> > >
> > >
> > >
> 
> 
> 


Mime
View raw message