lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "m.harig" <>
Subject indexing 100GB of data
Date Wed, 22 Jul 2009 06:07:45 GMT

hello all

             We've got 100GB of data which has doc,txt,pdf,ppt,etc.., we've
separate parser for each file format, so we're going to index those data by
lucene. (since we scared of Nutch setup , thats why we didn't use it) My
doubt is , will it be scalable when i index those dcouments ? we planned to
do separate index for each file format , and we planned to use multi index
reader for searching, please anyone suggest me 

          1. Are we going on the right way?
            2. Please suggest me about mergeFactors & segments
            3. How much index size can lucene handle?
            4. Will it cause for java OOM.
View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message