lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Fertig" <dfer...@cymfony.com>
Subject RE: best practice: 1.4 billions documents
Date Mon, 22 Nov 2010 02:22:58 GMT
Actually I've been bitten by an still-unresolved issue with the parallel searchers and recommend
a MultiReader instead.
We have a couple billion docs in our archives as well.  Breaking them up by day worked well
for us, but you'll need to do something.

-----Original Message-----
From: Luca Rondanini [mailto:luca.rondanini@gmail.com] 
Sent: Sunday, November 21, 2010 8:13 PM
To: java-user@lucene.apache.org; yonik@lucidimagination.com
Subject: Re: best practice: 1.4 billions documents

thank you both!

Johannes, katta seems interesting but I will need to solve the problems of
"hot" updates to the index

Yonik, I see your point - so your suggestion would be to build an
architecture based on ParallelMultiSearcher?


On Sun, Nov 21, 2010 at 3:48 PM, Yonik Seeley <yonik@lucidimagination.com>wrote:

> On Sun, Nov 21, 2010 at 6:33 PM, Luca Rondanini
> <luca.rondanini@gmail.com> wrote:
> > Hi everybody,
> >
> > I really need some good advice! I need to index in lucene something like
> 1.4
> > billions documents. I had experience in lucene but I've never worked with
> > such a big number of documents. Also this is just the number of docs at
> > "start-up": they are going to grow and fast.
> >
> > I don't have to tell you that I need the system to be fast and to support
> > real time updates to the documents
> >
> > The first solution that came to my mind was to use ParallelMultiSearcher,
> > splitting the index into many "sub-index" (how many docs per index?
> > 100,000?) but I don't have experience with it and I don't know how well
> will
> > scale while the number of documents grows!
> >
> > A more solid solution seems to build some kind of integration with
> hadoop.
> > But I didn't find match about lucene and hadoop integration.
> >
> > Any idea? Which direction should I go (pure lucene or hadoop)?
>
> There seems to be a common misconception about hadoop regarding search.
> Map-reduce as implemented in hadoop is really for batch oriented jobs
> only (or those types of jobs where you don't need a quick response
> time).  It's definitely not for normal queries (unless you have
> unusual requirements).
>
> -Yonik
> http://www.lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Mime
View raw message