lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <>
Subject Re: best practice: 1.4 billions documents
Date Sun, 21 Nov 2010 23:48:03 GMT
On Sun, Nov 21, 2010 at 6:33 PM, Luca Rondanini
<> wrote:
> Hi everybody,
> I really need some good advice! I need to index in lucene something like 1.4
> billions documents. I had experience in lucene but I've never worked with
> such a big number of documents. Also this is just the number of docs at
> "start-up": they are going to grow and fast.
> I don't have to tell you that I need the system to be fast and to support
> real time updates to the documents
> The first solution that came to my mind was to use ParallelMultiSearcher,
> splitting the index into many "sub-index" (how many docs per index?
> 100,000?) but I don't have experience with it and I don't know how well will
> scale while the number of documents grows!
> A more solid solution seems to build some kind of integration with hadoop.
> But I didn't find match about lucene and hadoop integration.
> Any idea? Which direction should I go (pure lucene or hadoop)?

There seems to be a common misconception about hadoop regarding search.
Map-reduce as implemented in hadoop is really for batch oriented jobs
only (or those types of jobs where you don't need a quick response
time).  It's definitely not for normal queries (unless you have
unusual requirements).


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message