lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James" <ja...@ryley.com>
Subject RE: Infrastructure for large Lucene index
Date Fri, 06 Oct 2006 21:31:46 GMT
Agreed.  For example, with patents we have to be concerned about
technology-related terms that are more prominent in certain time periods.  I
think a good random assignment scheme addresses most such problems, but
worst case you can always redo the indexes entirely if they get too
non-random.

Sincerely,
James

> -----Original Message-----
> From: Slava Imeshev [mailto:imeshev@yahoo.com]
> Sent: Friday, October 06, 2006 5:27 PM
> To: general@lucene.apache.org
> Subject: RE: Infrastructure for large Lucene index
> 
> -- James <james@ryley.com> wrote:
> > > If the index is broken into multiple "shards" then we need multiple
> copies
> > of each shard, and some way of loadbalancing and failing over amongst
> copies
> > of shards.
> >
> > Yep.  Unfortunately it's not simple, but those are all pieces of what we
> are
> > currently in the process of implementing.
> 
> The problem is that over time indexes develop "personality" and the term
> frequency
> can be vary significantly from index to index....
> 
> Slava
> 
> 
> 
> >
> >
> >
> > Sincerely,
> >
> > James Ryley, Ph.D.
> >
> > www.FreePatentsOnline.com <http://www.freepatentsonline.com/>
> >
> >
> >
> > > -----Original Message-----
> >
> > > From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik
> >
> > > Seeley
> >
> > > Sent: Friday, October 06, 2006 4:37 PM
> >
> > > To: general@lucene.apache.org
> >
> > > Subject: Re: Infrastructure for large Lucene index
> >
> > >
> >
> > > On 10/6/06, James <james@ryley.com> wrote:
> >
> > > > Our indexes are, in aggregate across our
> >
> > > > various collections, even larger than you need.  We use Remote
> >
> > > > ParalellMultiSearcher, with some custom modifications (and we are in
> the
> >
> > > > process of making more)
> >
> > >
> >
> > > I'm looking into adding some form of distributed search to Solr.
> >
> > > The main problem I see with directly using ParallelMultiSearcher is a
> >
> > > lack of high availability features.
> >
> > >
> >
> > > If the index is broken into multiple "shards" then we need multiple
> >
> > > copies of each shard, and some way of loadbalancing and failing over
> >
> > > amongst copies of shards.
> >
> > >
> >
> > > -Yonik
> >
> > > http://incubator.apache.org/solr Solr, the open-source Lucene search
> >
> > > server
> >
> >


Mime
View raw message