Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 27324 invoked from network); 2 Mar 2010 14:00:50 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 2 Mar 2010 14:00:50 -0000 Received: (qmail 10212 invoked by uid 500); 2 Mar 2010 14:00:45 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 10116 invoked by uid 500); 2 Mar 2010 14:00:44 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 10108 invoked by uid 99); 2 Mar 2010 14:00:44 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Mar 2010 14:00:44 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ian.lea@gmail.com designates 209.85.218.217 as permitted sender) Received: from [209.85.218.217] (HELO mail-bw0-f217.google.com) (209.85.218.217) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Mar 2010 14:00:35 +0000 Received: by bwz9 with SMTP id 9so193521bwz.5 for ; Tue, 02 Mar 2010 06:00:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=92EiJuEiYzdY8VZhd3QR+z0AiXmpWFOd7riuZvPQRWM=; b=jXwgybwPDL65Rhs4sixEDdKB+O/m7a/P6FczazbKIiAT/FCvscNIPExBX3obNB8KPS 7XidBB5WcZBXWCcrevA1MlTwpR6ANmw7g4KkDUigSvG7aZndlWnx45ZIyiZpnyH3oach vzLigZcCLa1QD6S1SjgHLYb3C65SNXHaZeZEE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=OXmZ0f7Xf+H3exIkztrjslCTfBS0RttQfXdvk66Mre7s9LS9XISEKc7FPLcj7vRf95 AunvGsEQQILvkAaN2AagWo0Gby6v1gdFYJuNQryXkn6PFpp8AQO2BjrmXYvhHyYTQXIn sJV+ctMBQ40v2IE78FgANobd9uMF/PZ/QhsWY= MIME-Version: 1.0 Received: by 10.204.33.131 with SMTP id h3mr4425788bkd.53.1267538415442; Tue, 02 Mar 2010 06:00:15 -0800 (PST) In-Reply-To: <27756082.post@talk.nabble.com> References: <27755872.post@talk.nabble.com> <359a92831003020539s30fc931fuad30ad4991f3ab9d@mail.gmail.com> <27756082.post@talk.nabble.com> From: Ian Lea Date: Tue, 2 Mar 2010 13:59:55 +0000 Message-ID: <8c4e68611003020559q568cd4ccj500c6f35bde10c2a@mail.gmail.com> Subject: Re: Lucene Indexing out of memory To: java-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Where exactly are you hitting the OOM exception? Have you got a stack trace? How much memory are you allocating to the JVM?=A0Have you run a profiler to find out what is using the memory? If it runs OK for 70K docs then fails, 2 possibilities come to mind: either the 70K + 1 doc is particularly large, or you or lucene (unlikely) are holding on to something that you shouldn't be. -- Ian. On Tue, Mar 2, 2010 at 1:48 PM, ajay_gupta wrote: > > Hi Erick, > I tried setting setRAMBufferSizeMB =A0as 200-500MB as well but still it g= oes > OOM error. > I thought its filebased indexing so memory shouldn't be an issue but you > might be right that when searching it might be using lot of memory ? Is > there way to load documents in chunks or someothere way to make it scalab= le > ? > > Thanks in advance > Ajay > > > Erick Erickson wrote: >> >> I'm not following this entirely, but these docs may be huge by the >> time you add context for every word in them. You say that you >> "search the existing indices then I get the content and append....". >> So is it possible that after 70K documents your additions become >> so huge that you're blowing up? Have you taken any measurements >> to determine how big the docs get as you index more and more >> of them? >> >> If the above is off base, have you tried setting >> IndexWriter.setRAMBufferSizeMB? >> >> HTH >> Erick >> >> On Tue, Mar 2, 2010 at 8:27 AM, ajay_gupta wrote: >> >>> >>> Hi, >>> It might be general question though but I couldn't find the answer yet.= I >>> have around 90k documents sizing around 350 MB. Each document contains = a >>> record which has some text content. For each word in this text I want t= o >>> store context for that word and index it so I am reading each document >>> and >>> for each word in that document I am appending fixed number of surroundi= ng >>> words. To do that first I search in existing indices if this word alrea= dy >>> exist and if it is then I get the content and append the new context an= d >>> update the document. In case no context exist I create a document with >>> fields "word" and "context" and add these two fields with values as wor= d >>> value and context value. >>> >>> I tried this in RAM but after certain no of docs it gave out of memory >>> error >>> so I thought to use FSDirectory method but surprisingly after 70k >>> documents >>> it also gave OOM error. I have enough disk space but still I am getting >>> this >>> error.I am not sure even for disk based indexing why its giving this >>> error. >>> I thought disk based indexing will be slow but atleast it will be >>> scalable. >>> Could someone suggest what could be the issue ? >>> >>> Thanks >>> Ajay >>> -- >>> View this message in context: >>> http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27755872= .html >>> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>> For additional commands, e-mail: java-user-help@lucene.apache.org >>> >>> >> >> > > -- > View this message in context: http://old.nabble.com/Lucene-Indexing-out-o= f-memory-tp27755872p27756082.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org