Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 5888 invoked from network); 15 Mar 2010 09:29:39 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 15 Mar 2010 09:29:39 -0000 Received: (qmail 13912 invoked by uid 500); 15 Mar 2010 09:28:52 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 13717 invoked by uid 500); 15 Mar 2010 09:28:51 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 13709 invoked by uid 99); 15 Mar 2010 09:28:50 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Mar 2010 09:28:50 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.217.212] (HELO mail-gx0-f212.google.com) (209.85.217.212) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Mar 2010 09:28:44 +0000 Received: by gxk4 with SMTP id 4so1631673gxk.5 for ; Mon, 15 Mar 2010 02:28:22 -0700 (PDT) MIME-Version: 1.0 Received: by 10.101.163.1 with SMTP id q1mr3354747ano.165.1268645302201; Mon, 15 Mar 2010 02:28:22 -0700 (PDT) In-Reply-To: <27900870.post@talk.nabble.com> References: <27755872.post@talk.nabble.com> <359a92831003020539s30fc931fuad30ad4991f3ab9d@mail.gmail.com> <27756082.post@talk.nabble.com> <8c4e68611003020559q568cd4ccj500c6f35bde10c2a@mail.gmail.com> <27767405.post@talk.nabble.com> <8c4e68611003030420w5b46eea9ye088cea516cb48c2@mail.gmail.com> <359a92831003030537n19a0d738h60b877c32cf1acc8@mail.gmail.com> <27900870.post@talk.nabble.com> Date: Mon, 15 Mar 2010 04:28:22 -0500 Message-ID: <9ac0c6aa1003150228g716fc5ebx4f82bcfac4d434ce@mail.gmail.com> Subject: Re: Lucene Indexing out of memory From: Michael McCandless To: java-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Try the ideas here? http://wiki.apache.org/lucene-java/ImproveIndexingSpeed Mike On Mon, Mar 15, 2010 at 1:51 AM, ajay_gupta wrote: > > Erick, > I did get some hint for my problem. There was a bug in the code which was > eating up the memory which I figured out after lot of effort. > Thanks All of you for your suggestions. > But I still feel it takes lot of time to index documents. Its taking arou= nd > an hour or more for indexing 330 MB file. (90 k documents). I am not sure > how much time it should take but I feel its slow , =A0I am using FileDire= ctory > to store indices. > > Regards > Ajay > > > > > Erick Erickson wrote: >> >> Interpolating from your data (and, by the way, some code >> examples would help a lot), if you're reopening the index >> reader to pick up recent additions but not closing it if a >> different one is returned from reopen, you'll consume >> resources. From the JavaDocs... >> >> =A0IndexReader new =3D r.reopen(); >> =A0if (new !=3D reader) { >> =A0 =A0... =A0 =A0 // reader was reopened >> =A0 =A0reader.close(); >> =A0} >> =A0reader =3D new; >> >> >> On Wed, Mar 3, 2010 at 7:20 AM, Ian Lea wrote: >> >>> Lucene doesn't load everything into memory and can carry on running >>> consecutive searches or loading documents for ever without hitting OOM >>> exceptions. =A0So if it isn't failing on a specific document the most >>> likely cause is that your program is hanging on to something it >>> shouldn't. Previous docs? File handles? =A0Lucene readers/searchers? >>> >>> >>> -- >>> Ian. >>> >>> >>> On Wed, Mar 3, 2010 at 12:12 PM, ajay_gupta wrote: >>> > >>> > Ian, >>> > OOM exception point varies not fixed. It could come anywhere once >>> memory >>> > exceeds a certain point. >>> > I have allocated 1 GB memory for JVM. I haven't used profiler. >>> > When I said after 70 K docs it fails i meant approx 70k documents but >>> if >>> I >>> > reduce memory then it will OOM before 70K so its not specific to any >>> > particular document. >>> > To add each document first I search and then do update so I am not su= re >>> > whether lucene loads all the indices for search and thats why its goi= ng >>> OOM >>> > ? I am not sure how search operation works in Lucene. >>> > >>> > >>> > Thanks >>> > Ajay >>> > >>> > >>> > Ian Lea wrote: >>> >> >>> >> Where exactly are you hitting the OOM exception? =A0Have you got a s= tack >>> >> trace? =A0How much memory are you allocating to the JVM? Have you ru= n a >>> >> profiler to find out what is using the memory? >>> >> >>> >> If it runs OK for 70K docs then fails, 2 possibilities come to mind: >>> >> either the 70K + 1 doc is particularly large, or you or lucene >>> >> (unlikely) are holding on to something that you shouldn't be. >>> >> >>> >> >>> >> -- >>> >> Ian. >>> >> >>> >> >>> >> On Tue, Mar 2, 2010 at 1:48 PM, ajay_gupta wrote= : >>> >>> >>> >>> Hi Erick, >>> >>> I tried setting setRAMBufferSizeMB =A0as 200-500MB as well but stil= l it >>> >>> goes >>> >>> OOM error. >>> >>> I thought its filebased indexing so memory shouldn't be an issue bu= t >>> you >>> >>> might be right that when searching it might be using lot of memory = ? >>> Is >>> >>> there way to load documents in chunks or someothere way to make it >>> >>> scalable >>> >>> ? >>> >>> >>> >>> Thanks in advance >>> >>> Ajay >>> >>> >>> >>> >>> >>> Erick Erickson wrote: >>> >>>> >>> >>>> I'm not following this entirely, but these docs may be huge by the >>> >>>> time you add context for every word in them. You say that you >>> >>>> "search the existing indices then I get the content and append....= ". >>> >>>> So is it possible that after 70K documents your additions become >>> >>>> so huge that you're blowing up? Have you taken any measurements >>> >>>> to determine how big the docs get as you index more and more >>> >>>> of them? >>> >>>> >>> >>>> If the above is off base, have you tried setting >>> >>>> IndexWriter.setRAMBufferSizeMB? >>> >>>> >>> >>>> HTH >>> >>>> Erick >>> >>>> >>> >>>> On Tue, Mar 2, 2010 at 8:27 AM, ajay_gupta >>> wrote: >>> >>>> >>> >>>>> >>> >>>>> Hi, >>> >>>>> It might be general question though but I couldn't find the answe= r >>> yet. >>> >>>>> I >>> >>>>> have around 90k documents sizing around 350 MB. Each document >>> contains >>> >>>>> a >>> >>>>> record which has some text content. For each word in this text I >>> want >>> >>>>> to >>> >>>>> store context for that word and index it so I am reading each >>> document >>> >>>>> and >>> >>>>> for each word in that document I am appending fixed number of >>> >>>>> surrounding >>> >>>>> words. To do that first I search in existing indices if this word >>> >>>>> already >>> >>>>> exist and if it is then I get the content and append the new >>> context >>> >>>>> and >>> >>>>> update the document. In case no context exist I create a document >>> with >>> >>>>> fields "word" and "context" and add these two fields with values = as >>> >>>>> word >>> >>>>> value and context value. >>> >>>>> >>> >>>>> I tried this in RAM but after certain no of docs it gave out of >>> memory >>> >>>>> error >>> >>>>> so I thought to use FSDirectory method but surprisingly after 70k >>> >>>>> documents >>> >>>>> it also gave OOM error. I have enough disk space but still I am >>> getting >>> >>>>> this >>> >>>>> error.I am not sure even for disk based indexing why its giving >>> this >>> >>>>> error. >>> >>>>> I thought disk based indexing will be slow but atleast it will be >>> >>>>> scalable. >>> >>>>> Could someone suggest what could be the issue ? >>> >>>>> >>> >>>>> Thanks >>> >>>>> Ajay >>> >>>>> -- >>> >>>>> View this message in context: >>> >>>>> >>> http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27755872= .html >>> >>>>> Sent from the Lucene - Java Users mailing list archive at >>> Nabble.com. >>> >>>>> >>> >>>>> >>> >>>>> >>> --------------------------------------------------------------------- >>> >>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>> >>>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>> >>>>> >>> >>>>> >>> >>>> >>> >>>> >>> >>> >>> >>> -- >>> >>> View this message in context: >>> >>> >>> http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27756082= .html >>> >>> Sent from the Lucene - Java Users mailing list archive at Nabble.co= m. >>> >>> >>> >>> >>> >>> -------------------------------------------------------------------= -- >>> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>> >>> For additional commands, e-mail: java-user-help@lucene.apache.org >>> >>> >>> >>> >>> >> >>> >> --------------------------------------------------------------------= - >>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>> >> For additional commands, e-mail: java-user-help@lucene.apache.org >>> >> >>> >> >>> >> >>> > >>> > -- >>> > View this message in context: >>> http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27767405= .html >>> > Sent from the Lucene - Java Users mailing list archive at Nabble.com. >>> > >>> > >>> > --------------------------------------------------------------------- >>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>> > For additional commands, e-mail: java-user-help@lucene.apache.org >>> > >>> > >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>> For additional commands, e-mail: java-user-help@lucene.apache.org >>> >>> >> >> > > -- > View this message in context: http://old.nabble.com/Lucene-Indexing-out-o= f-memory-tp27755872p27900870.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org