Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 89007 invoked from network); 22 Jul 2009 06:07:15 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 22 Jul 2009 06:07:15 -0000 Received: (qmail 49242 invoked by uid 500); 22 Jul 2009 06:08:18 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 49160 invoked by uid 500); 22 Jul 2009 06:08:17 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 49149 invoked by uid 99); 22 Jul 2009 06:08:17 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Jul 2009 06:08:17 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of lists@nabble.com designates 216.139.236.158 as permitted sender) Received: from [216.139.236.158] (HELO kuber.nabble.com) (216.139.236.158) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Jul 2009 06:08:06 +0000 Received: from isper.nabble.com ([192.168.236.156]) by kuber.nabble.com with esmtp (Exim 4.63) (envelope-from ) id 1MTUzR-0000Q5-RR for java-user@lucene.apache.org; Tue, 21 Jul 2009 23:07:45 -0700 Message-ID: <24600563.post@talk.nabble.com> Date: Tue, 21 Jul 2009 23:07:45 -0700 (PDT) From: "m.harig" To: java-user@lucene.apache.org Subject: indexing 100GB of data MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Nabble-From: m.harig@gmail.com X-Virus-Checked: Checked by ClamAV on apache.org hello all We've got 100GB of data which has doc,txt,pdf,ppt,etc.., we've separate parser for each file format, so we're going to index those data by lucene. (since we scared of Nutch setup , thats why we didn't use it) My doubt is , will it be scalable when i index those dcouments ? we planned to do separate index for each file format , and we planned to use multi index reader for searching, please anyone suggest me 1. Are we going on the right way? 2. Please suggest me about mergeFactors & segments 3. How much index size can lucene handle? 4. Will it cause for java OOM. -- View this message in context: http://www.nabble.com/indexing-100GB-of-data-tp24600563p24600563.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org