Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (nike.apache.org: domain of lists@nabble.com designates
 216.139.236.158 as permitted sender)
Message-ID: <24600563.post@talk.nabble.com>
Date: Tue, 21 Jul 2009 23:07:45 -0700 (PDT)
From: "m.harig" <m.harig@gmail.com>
To: java-user@lucene.apache.org
Subject: indexing 100GB of data
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit


hello all

             We've got 100GB of data which has doc,txt,pdf,ppt,etc.., we've
separate parser for each file format, so we're going to index those data by
lucene. (since we scared of Nutch setup , thats why we didn't use it) My
doubt is , will it be scalable when i index those dcouments ? we planned to
do separate index for each file format , and we planned to use multi index
reader for searching, please anyone suggest me 

          1. Are we going on the right way?
            2. Please suggest me about mergeFactors & segments
            3. How much index size can lucene handle?
            4. Will it cause for java OOM.
-- 
View this message in context: http://www.nabble.com/indexing-100GB-of-data-tp24600563p24600563.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org