Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (athena.apache.org: domain of erickerickson@gmail.com
 designates 74.125.82.176 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=Wr925XUPkTkmpVtzIwm+1ImBFGZ2SXN2SdI0B/Ex1ElT/fTdiwgrKAntUXfLEGoFQK
         fmrAizOe9Plz/jA04A8OA1hkSl4Bx0avAMc3bNl9OnDXIUdrQcIXHAPAq4wCPMWqv6J1
         aprB5pvxN6ZM/ZBGk8AR8N8o/98FpnmmgI3Ls=
MIME-Version: 1.0
In-Reply-To: <8c4e68611003030420w5b46eea9ye088cea516cb48c2@mail.gmail.com>
References: <27755872.post@talk.nabble.com>
	 <359a92831003020539s30fc931fuad30ad4991f3ab9d@mail.gmail.com>
	 <27756082.post@talk.nabble.com>
	 <8c4e68611003020559q568cd4ccj500c6f35bde10c2a@mail.gmail.com>
	 <27767405.post@talk.nabble.com>
	 <8c4e68611003030420w5b46eea9ye088cea516cb48c2@mail.gmail.com>
Date: Wed, 3 Mar 2010 08:37:40 -0500
Message-ID: <359a92831003030537n19a0d738h60b877c32cf1acc8@mail.gmail.com>
Subject: Re: Lucene Indexing out of memory
From: Erick Erickson <erickerickson@gmail.com>
To: java-user@lucene.apache.org
Content-Type: multipart/alternative; boundary=0016e6d566401527170480e596f5

--0016e6d566401527170480e596f5
Content-Type: text/plain; charset=ISO-8859-1

Interpolating from your data (and, by the way, some code
examples would help a lot), if you're reopening the index
reader to pick up recent additions but not closing it if a
different one is returned from reopen, you'll consume
resources. From the JavaDocs...

 IndexReader new = r.reopen();
 if (new != reader) {
   ...     // reader was reopened
   reader.close();
 }
 reader = new;


On Wed, Mar 3, 2010 at 7:20 AM, Ian Lea <ian.lea@gmail.com> wrote:

> Lucene doesn't load everything into memory and can carry on running
> consecutive searches or loading documents for ever without hitting OOM
> exceptions.  So if it isn't failing on a specific document the most
> likely cause is that your program is hanging on to something it
> shouldn't. Previous docs? File handles?  Lucene readers/searchers?
>
>
> --
> Ian.
>
>
> On Wed, Mar 3, 2010 at 12:12 PM, ajay_gupta <ajay978@gmail.com> wrote:
> >
> > Ian,
> > OOM exception point varies not fixed. It could come anywhere once memory
> > exceeds a certain point.
> > I have allocated 1 GB memory for JVM. I haven't used profiler.
> > When I said after 70 K docs it fails i meant approx 70k documents but if
> I
> > reduce memory then it will OOM before 70K so its not specific to any
> > particular document.
> > To add each document first I search and then do update so I am not sure
> > whether lucene loads all the indices for search and thats why its going
> OOM
> > ? I am not sure how search operation works in Lucene.
> >
> >
> > Thanks
> > Ajay
> >
> >
> > Ian Lea wrote:
> >>
> >> Where exactly are you hitting the OOM exception?  Have you got a stack
> >> trace?  How much memory are you allocating to the JVM? Have you run a
> >> profiler to find out what is using the memory?
> >>
> >> If it runs OK for 70K docs then fails, 2 possibilities come to mind:
> >> either the 70K + 1 doc is particularly large, or you or lucene
> >> (unlikely) are holding on to something that you shouldn't be.
> >>
> >>
> >> --
> >> Ian.
> >>
> >>
> >> On Tue, Mar 2, 2010 at 1:48 PM, ajay_gupta <ajay978@gmail.com> wrote:
> >>>
> >>> Hi Erick,
> >>> I tried setting setRAMBufferSizeMB  as 200-500MB as well but still it
> >>> goes
> >>> OOM error.
> >>> I thought its filebased indexing so memory shouldn't be an issue but
> you
> >>> might be right that when searching it might be using lot of memory ? Is
> >>> there way to load documents in chunks or someothere way to make it
> >>> scalable
> >>> ?
> >>>
> >>> Thanks in advance
> >>> Ajay
> >>>
> >>>
> >>> Erick Erickson wrote:
> >>>>
> >>>> I'm not following this entirely, but these docs may be huge by the
> >>>> time you add context for every word in them. You say that you
> >>>> "search the existing indices then I get the content and append....".
> >>>> So is it possible that after 70K documents your additions become
> >>>> so huge that you're blowing up? Have you taken any measurements
> >>>> to determine how big the docs get as you index more and more
> >>>> of them?
> >>>>
> >>>> If the above is off base, have you tried setting
> >>>> IndexWriter.setRAMBufferSizeMB?
> >>>>
> >>>> HTH
> >>>> Erick
> >>>>
> >>>> On Tue, Mar 2, 2010 at 8:27 AM, ajay_gupta <ajay978@gmail.com> wrote:
> >>>>
> >>>>>
> >>>>> Hi,
> >>>>> It might be general question though but I couldn't find the answer
> yet.
> >>>>> I
> >>>>> have around 90k documents sizing around 350 MB. Each document
> contains
> >>>>> a
> >>>>> record which has some text content. For each word in this text I want
> >>>>> to
> >>>>> store context for that word and index it so I am reading each
> document
> >>>>> and
> >>>>> for each word in that document I am appending fixed number of
> >>>>> surrounding
> >>>>> words. To do that first I search in existing indices if this word
> >>>>> already
> >>>>> exist and if it is then I get the content and append the new context
> >>>>> and
> >>>>> update the document. In case no context exist I create a document
> with
> >>>>> fields "word" and "context" and add these two fields with values as
> >>>>> word
> >>>>> value and context value.
> >>>>>
> >>>>> I tried this in RAM but after certain no of docs it gave out of
> memory
> >>>>> error
> >>>>> so I thought to use FSDirectory method but surprisingly after 70k
> >>>>> documents
> >>>>> it also gave OOM error. I have enough disk space but still I am
> getting
> >>>>> this
> >>>>> error.I am not sure even for disk based indexing why its giving this
> >>>>> error.
> >>>>> I thought disk based indexing will be slow but atleast it will be
> >>>>> scalable.
> >>>>> Could someone suggest what could be the issue ?
> >>>>>
> >>>>> Thanks
> >>>>> Ajay
> >>>>> --
> >>>>> View this message in context:
> >>>>>
> http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27755872.html
> >>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >>>>>
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>> --
> >>> View this message in context:
> >>>
> http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27756082.html
> >>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >>
> >
> > --
> > View this message in context:
> http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27767405.html
> > Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

--0016e6d566401527170480e596f5--