lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luca Rondanini <>
Subject best practice: 1.4 billions documents
Date Sun, 21 Nov 2010 23:33:06 GMT
Hi everybody,

I really need some good advice! I need to index in lucene something like 1.4
billions documents. I had experience in lucene but I've never worked with
such a big number of documents. Also this is just the number of docs at
"start-up": they are going to grow and fast.

I don't have to tell you that I need the system to be fast and to support
real time updates to the documents

The first solution that came to my mind was to use ParallelMultiSearcher,
splitting the index into many "sub-index" (how many docs per index?
100,000?) but I don't have experience with it and I don't know how well will
scale while the number of documents grows!

A more solid solution seems to build some kind of integration with hadoop.
But I didn't find match about lucene and hadoop integration.

Any idea? Which direction should I go (pure lucene or hadoop)?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message