Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 18870 invoked from network); 3 Mar 2010 13:38:19 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 3 Mar 2010 13:38:19 -0000 Received: (qmail 88840 invoked by uid 500); 3 Mar 2010 13:38:10 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 88780 invoked by uid 500); 3 Mar 2010 13:38:10 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 88772 invoked by uid 99); 3 Mar 2010 13:38:10 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Mar 2010 13:38:10 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of erickerickson@gmail.com designates 74.125.82.176 as permitted sender) Received: from [74.125.82.176] (HELO mail-wy0-f176.google.com) (74.125.82.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Mar 2010 13:38:02 +0000 Received: by wyb38 with SMTP id 38so351770wyb.35 for ; Wed, 03 Mar 2010 05:37:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=j4i9pON8t54Z+S3mkCVbZDc/av1jRpTi8IPlPpL1xMY=; b=QHtCJkSQgDKWEymL5H2kTmBM5vi50V/j80vXP2UltvwmP1flN5RHtswxX5C2zHjfT5 7FT3yAVAzyh3TsxPQLqsUNPHsLBFudZ94gwvl/He84Y9FG/zBpu5BjaPNd4yF+mMWbHR +/qrRUbn0n6Ca5nPOH2+Ex7P08I960Ffh1h10= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=Wr925XUPkTkmpVtzIwm+1ImBFGZ2SXN2SdI0B/Ex1ElT/fTdiwgrKAntUXfLEGoFQK fmrAizOe9Plz/jA04A8OA1hkSl4Bx0avAMc3bNl9OnDXIUdrQcIXHAPAq4wCPMWqv6J1 aprB5pvxN6ZM/ZBGk8AR8N8o/98FpnmmgI3Ls= MIME-Version: 1.0 Received: by 10.216.90.70 with SMTP id d48mr1061290wef.199.1267623460676; Wed, 03 Mar 2010 05:37:40 -0800 (PST) In-Reply-To: <8c4e68611003030420w5b46eea9ye088cea516cb48c2@mail.gmail.com> References: <27755872.post@talk.nabble.com> <359a92831003020539s30fc931fuad30ad4991f3ab9d@mail.gmail.com> <27756082.post@talk.nabble.com> <8c4e68611003020559q568cd4ccj500c6f35bde10c2a@mail.gmail.com> <27767405.post@talk.nabble.com> <8c4e68611003030420w5b46eea9ye088cea516cb48c2@mail.gmail.com> Date: Wed, 3 Mar 2010 08:37:40 -0500 Message-ID: <359a92831003030537n19a0d738h60b877c32cf1acc8@mail.gmail.com> Subject: Re: Lucene Indexing out of memory From: Erick Erickson To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=0016e6d566401527170480e596f5 --0016e6d566401527170480e596f5 Content-Type: text/plain; charset=ISO-8859-1 Interpolating from your data (and, by the way, some code examples would help a lot), if you're reopening the index reader to pick up recent additions but not closing it if a different one is returned from reopen, you'll consume resources. From the JavaDocs... IndexReader new = r.reopen(); if (new != reader) { ... // reader was reopened reader.close(); } reader = new; On Wed, Mar 3, 2010 at 7:20 AM, Ian Lea wrote: > Lucene doesn't load everything into memory and can carry on running > consecutive searches or loading documents for ever without hitting OOM > exceptions. So if it isn't failing on a specific document the most > likely cause is that your program is hanging on to something it > shouldn't. Previous docs? File handles? Lucene readers/searchers? > > > -- > Ian. > > > On Wed, Mar 3, 2010 at 12:12 PM, ajay_gupta wrote: > > > > Ian, > > OOM exception point varies not fixed. It could come anywhere once memory > > exceeds a certain point. > > I have allocated 1 GB memory for JVM. I haven't used profiler. > > When I said after 70 K docs it fails i meant approx 70k documents but if > I > > reduce memory then it will OOM before 70K so its not specific to any > > particular document. > > To add each document first I search and then do update so I am not sure > > whether lucene loads all the indices for search and thats why its going > OOM > > ? I am not sure how search operation works in Lucene. > > > > > > Thanks > > Ajay > > > > > > Ian Lea wrote: > >> > >> Where exactly are you hitting the OOM exception? Have you got a stack > >> trace? How much memory are you allocating to the JVM? Have you run a > >> profiler to find out what is using the memory? > >> > >> If it runs OK for 70K docs then fails, 2 possibilities come to mind: > >> either the 70K + 1 doc is particularly large, or you or lucene > >> (unlikely) are holding on to something that you shouldn't be. > >> > >> > >> -- > >> Ian. > >> > >> > >> On Tue, Mar 2, 2010 at 1:48 PM, ajay_gupta wrote: > >>> > >>> Hi Erick, > >>> I tried setting setRAMBufferSizeMB as 200-500MB as well but still it > >>> goes > >>> OOM error. > >>> I thought its filebased indexing so memory shouldn't be an issue but > you > >>> might be right that when searching it might be using lot of memory ? Is > >>> there way to load documents in chunks or someothere way to make it > >>> scalable > >>> ? > >>> > >>> Thanks in advance > >>> Ajay > >>> > >>> > >>> Erick Erickson wrote: > >>>> > >>>> I'm not following this entirely, but these docs may be huge by the > >>>> time you add context for every word in them. You say that you > >>>> "search the existing indices then I get the content and append....". > >>>> So is it possible that after 70K documents your additions become > >>>> so huge that you're blowing up? Have you taken any measurements > >>>> to determine how big the docs get as you index more and more > >>>> of them? > >>>> > >>>> If the above is off base, have you tried setting > >>>> IndexWriter.setRAMBufferSizeMB? > >>>> > >>>> HTH > >>>> Erick > >>>> > >>>> On Tue, Mar 2, 2010 at 8:27 AM, ajay_gupta wrote: > >>>> > >>>>> > >>>>> Hi, > >>>>> It might be general question though but I couldn't find the answer > yet. > >>>>> I > >>>>> have around 90k documents sizing around 350 MB. Each document > contains > >>>>> a > >>>>> record which has some text content. For each word in this text I want > >>>>> to > >>>>> store context for that word and index it so I am reading each > document > >>>>> and > >>>>> for each word in that document I am appending fixed number of > >>>>> surrounding > >>>>> words. To do that first I search in existing indices if this word > >>>>> already > >>>>> exist and if it is then I get the content and append the new context > >>>>> and > >>>>> update the document. In case no context exist I create a document > with > >>>>> fields "word" and "context" and add these two fields with values as > >>>>> word > >>>>> value and context value. > >>>>> > >>>>> I tried this in RAM but after certain no of docs it gave out of > memory > >>>>> error > >>>>> so I thought to use FSDirectory method but surprisingly after 70k > >>>>> documents > >>>>> it also gave OOM error. I have enough disk space but still I am > getting > >>>>> this > >>>>> error.I am not sure even for disk based indexing why its giving this > >>>>> error. > >>>>> I thought disk based indexing will be slow but atleast it will be > >>>>> scalable. > >>>>> Could someone suggest what could be the issue ? > >>>>> > >>>>> Thanks > >>>>> Ajay > >>>>> -- > >>>>> View this message in context: > >>>>> > http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27755872.html > >>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com. > >>>>> > >>>>> > >>>>> --------------------------------------------------------------------- > >>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > >>>>> For additional commands, e-mail: java-user-help@lucene.apache.org > >>>>> > >>>>> > >>>> > >>>> > >>> > >>> -- > >>> View this message in context: > >>> > http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27756082.html > >>> Sent from the Lucene - Java Users mailing list archive at Nabble.com. > >>> > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > >>> For additional commands, e-mail: java-user-help@lucene.apache.org > >>> > >>> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > >> For additional commands, e-mail: java-user-help@lucene.apache.org > >> > >> > >> > > > > -- > > View this message in context: > http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27767405.html > > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --0016e6d566401527170480e596f5--