Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: <27900870.post@talk.nabble.com>
References: <27755872.post@talk.nabble.com>
	 <359a92831003020539s30fc931fuad30ad4991f3ab9d@mail.gmail.com>
	 <27756082.post@talk.nabble.com>
	 <8c4e68611003020559q568cd4ccj500c6f35bde10c2a@mail.gmail.com>
	 <27767405.post@talk.nabble.com>
	 <8c4e68611003030420w5b46eea9ye088cea516cb48c2@mail.gmail.com>
	 <359a92831003030537n19a0d738h60b877c32cf1acc8@mail.gmail.com>
	 <27900870.post@talk.nabble.com>
Date: Mon, 15 Mar 2010 04:28:22 -0500
Message-ID: <9ac0c6aa1003150228g716fc5ebx4f82bcfac4d434ce@mail.gmail.com>
Subject: Re: Lucene Indexing out of memory
From: Michael McCandless <lucene@mikemccandless.com>
To: java-user@lucene.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Try the ideas here?

    http://wiki.apache.org/lucene-java/ImproveIndexingSpeed

Mike

On Mon, Mar 15, 2010 at 1:51 AM, ajay_gupta <ajay978@gmail.com> wrote:
>
> Erick,
> I did get some hint for my problem. There was a bug in the code which was
> eating up the memory which I figured out after lot of effort.
> Thanks All of you for your suggestions.
> But I still feel it takes lot of time to index documents. Its taking arou=
nd
> an hour or more for indexing 330 MB file. (90 k documents). I am not sure
> how much time it should take but I feel its slow , =A0I am using FileDire=
ctory
> to store indices.
>
> Regards
> Ajay
>
>
>
>
> Erick Erickson wrote:
>>
>> Interpolating from your data (and, by the way, some code
>> examples would help a lot), if you're reopening the index
>> reader to pick up recent additions but not closing it if a
>> different one is returned from reopen, you'll consume
>> resources. From the JavaDocs...
>>
>> =A0IndexReader new =3D r.reopen();
>> =A0if (new !=3D reader) {
>> =A0 =A0... =A0 =A0 // reader was reopened
>> =A0 =A0reader.close();
>> =A0}
>> =A0reader =3D new;
>>
>>
>> On Wed, Mar 3, 2010 at 7:20 AM, Ian Lea <ian.lea@gmail.com> wrote:
>>
>>> Lucene doesn't load everything into memory and can carry on running
>>> consecutive searches or loading documents for ever without hitting OOM
>>> exceptions. =A0So if it isn't failing on a specific document the most
>>> likely cause is that your program is hanging on to something it
>>> shouldn't. Previous docs? File handles? =A0Lucene readers/searchers?
>>>
>>>
>>> --
>>> Ian.
>>>
>>>
>>> On Wed, Mar 3, 2010 at 12:12 PM, ajay_gupta <ajay978@gmail.com> wrote:
>>> >
>>> > Ian,
>>> > OOM exception point varies not fixed. It could come anywhere once
>>> memory
>>> > exceeds a certain point.
>>> > I have allocated 1 GB memory for JVM. I haven't used profiler.
>>> > When I said after 70 K docs it fails i meant approx 70k documents but
>>> if
>>> I
>>> > reduce memory then it will OOM before 70K so its not specific to any
>>> > particular document.
>>> > To add each document first I search and then do update so I am not su=
re
>>> > whether lucene loads all the indices for search and thats why its goi=
ng
>>> OOM
>>> > ? I am not sure how search operation works in Lucene.
>>> >
>>> >
>>> > Thanks
>>> > Ajay
>>> >
>>> >
>>> > Ian Lea wrote:
>>> >>
>>> >> Where exactly are you hitting the OOM exception? =A0Have you got a s=
tack
>>> >> trace? =A0How much memory are you allocating to the JVM? Have you ru=
n a
>>> >> profiler to find out what is using the memory?
>>> >>
>>> >> If it runs OK for 70K docs then fails, 2 possibilities come to mind:
>>> >> either the 70K + 1 doc is particularly large, or you or lucene
>>> >> (unlikely) are holding on to something that you shouldn't be.
>>> >>
>>> >>
>>> >> --
>>> >> Ian.
>>> >>
>>> >>
>>> >> On Tue, Mar 2, 2010 at 1:48 PM, ajay_gupta <ajay978@gmail.com> wrote=
:
>>> >>>
>>> >>> Hi Erick,
>>> >>> I tried setting setRAMBufferSizeMB =A0as 200-500MB as well but stil=
l it
>>> >>> goes
>>> >>> OOM error.
>>> >>> I thought its filebased indexing so memory shouldn't be an issue bu=
t
>>> you
>>> >>> might be right that when searching it might be using lot of memory =
?
>>> Is
>>> >>> there way to load documents in chunks or someothere way to make it
>>> >>> scalable
>>> >>> ?
>>> >>>
>>> >>> Thanks in advance
>>> >>> Ajay
>>> >>>
>>> >>>
>>> >>> Erick Erickson wrote:
>>> >>>>
>>> >>>> I'm not following this entirely, but these docs may be huge by the
>>> >>>> time you add context for every word in them. You say that you
>>> >>>> "search the existing indices then I get the content and append....=
".
>>> >>>> So is it possible that after 70K documents your additions become
>>> >>>> so huge that you're blowing up? Have you taken any measurements
>>> >>>> to determine how big the docs get as you index more and more
>>> >>>> of them?
>>> >>>>
>>> >>>> If the above is off base, have you tried setting
>>> >>>> IndexWriter.setRAMBufferSizeMB?
>>> >>>>
>>> >>>> HTH
>>> >>>> Erick
>>> >>>>
>>> >>>> On Tue, Mar 2, 2010 at 8:27 AM, ajay_gupta <ajay978@gmail.com>
>>> wrote:
>>> >>>>
>>> >>>>>
>>> >>>>> Hi,
>>> >>>>> It might be general question though but I couldn't find the answe=
r
>>> yet.
>>> >>>>> I
>>> >>>>> have around 90k documents sizing around 350 MB. Each document
>>> contains
>>> >>>>> a
>>> >>>>> record which has some text content. For each word in this text I
>>> want
>>> >>>>> to
>>> >>>>> store context for that word and index it so I am reading each
>>> document
>>> >>>>> and
>>> >>>>> for each word in that document I am appending fixed number of
>>> >>>>> surrounding
>>> >>>>> words. To do that first I search in existing indices if this word
>>> >>>>> already
>>> >>>>> exist and if it is then I get the content and append the new
>>> context
>>> >>>>> and
>>> >>>>> update the document. In case no context exist I create a document
>>> with
>>> >>>>> fields "word" and "context" and add these two fields with values =
as
>>> >>>>> word
>>> >>>>> value and context value.
>>> >>>>>
>>> >>>>> I tried this in RAM but after certain no of docs it gave out of
>>> memory
>>> >>>>> error
>>> >>>>> so I thought to use FSDirectory method but surprisingly after 70k
>>> >>>>> documents
>>> >>>>> it also gave OOM error. I have enough disk space but still I am
>>> getting
>>> >>>>> this
>>> >>>>> error.I am not sure even for disk based indexing why its giving
>>> this
>>> >>>>> error.
>>> >>>>> I thought disk based indexing will be slow but atleast it will be
>>> >>>>> scalable.
>>> >>>>> Could someone suggest what could be the issue ?
>>> >>>>>
>>> >>>>> Thanks
>>> >>>>> Ajay
>>> >>>>> --
>>> >>>>> View this message in context:
>>> >>>>>
>>> http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27755872=
.html
>>> >>>>> Sent from the Lucene - Java Users mailing list archive at
>>> Nabble.com.
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> ---------------------------------------------------------------------
>>> >>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> >>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>> >>>>>
>>> >>>>>
>>> >>>>
>>> >>>>
>>> >>>
>>> >>> --
>>> >>> View this message in context:
>>> >>>
>>> http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27756082=
.html
>>> >>> Sent from the Lucene - Java Users mailing list archive at Nabble.co=
m.
>>> >>>
>>> >>>
>>> >>> -------------------------------------------------------------------=
--
>>> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>> >>>
>>> >>>
>>> >>
>>> >> --------------------------------------------------------------------=
-
>>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>>> >>
>>> >>
>>> >>
>>> >
>>> > --
>>> > View this message in context:
>>> http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27767405=
.html
>>> > Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>> >
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>>> >
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>>
>
> --
> View this message in context: http://old.nabble.com/Lucene-Indexing-out-o=
f-memory-tp27755872p27900870.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org