Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (nike.apache.org: domain of ian.lea@gmail.com designates
 209.85.218.217 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :content-type:content-transfer-encoding;
        b=OXmZ0f7Xf+H3exIkztrjslCTfBS0RttQfXdvk66Mre7s9LS9XISEKc7FPLcj7vRf95
         AunvGsEQQILvkAaN2AagWo0Gby6v1gdFYJuNQryXkn6PFpp8AQO2BjrmXYvhHyYTQXIn
         sJV+ctMBQ40v2IE78FgANobd9uMF/PZ/QhsWY=
MIME-Version: 1.0
In-Reply-To: <27756082.post@talk.nabble.com>
References: <27755872.post@talk.nabble.com>
 <359a92831003020539s30fc931fuad30ad4991f3ab9d@mail.gmail.com>
	<27756082.post@talk.nabble.com>
From: Ian Lea <ian.lea@gmail.com>
Date: Tue, 2 Mar 2010 13:59:55 +0000
Message-ID: <8c4e68611003020559q568cd4ccj500c6f35bde10c2a@mail.gmail.com>
Subject: Re: Lucene Indexing out of memory
To: java-user@lucene.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Where exactly are you hitting the OOM exception?  Have you got a stack
trace?  How much memory are you allocating to the JVM?=A0Have you run a
profiler to find out what is using the memory?

If it runs OK for 70K docs then fails, 2 possibilities come to mind:
either the 70K + 1 doc is particularly large, or you or lucene
(unlikely) are holding on to something that you shouldn't be.


--
Ian.


On Tue, Mar 2, 2010 at 1:48 PM, ajay_gupta <ajay978@gmail.com> wrote:
>
> Hi Erick,
> I tried setting setRAMBufferSizeMB =A0as 200-500MB as well but still it g=
oes
> OOM error.
> I thought its filebased indexing so memory shouldn't be an issue but you
> might be right that when searching it might be using lot of memory ? Is
> there way to load documents in chunks or someothere way to make it scalab=
le
> ?
>
> Thanks in advance
> Ajay
>
>
> Erick Erickson wrote:
>>
>> I'm not following this entirely, but these docs may be huge by the
>> time you add context for every word in them. You say that you
>> "search the existing indices then I get the content and append....".
>> So is it possible that after 70K documents your additions become
>> so huge that you're blowing up? Have you taken any measurements
>> to determine how big the docs get as you index more and more
>> of them?
>>
>> If the above is off base, have you tried setting
>> IndexWriter.setRAMBufferSizeMB?
>>
>> HTH
>> Erick
>>
>> On Tue, Mar 2, 2010 at 8:27 AM, ajay_gupta <ajay978@gmail.com> wrote:
>>
>>>
>>> Hi,
>>> It might be general question though but I couldn't find the answer yet.=
 I
>>> have around 90k documents sizing around 350 MB. Each document contains =
a
>>> record which has some text content. For each word in this text I want t=
o
>>> store context for that word and index it so I am reading each document
>>> and
>>> for each word in that document I am appending fixed number of surroundi=
ng
>>> words. To do that first I search in existing indices if this word alrea=
dy
>>> exist and if it is then I get the content and append the new context an=
d
>>> update the document. In case no context exist I create a document with
>>> fields "word" and "context" and add these two fields with values as wor=
d
>>> value and context value.
>>>
>>> I tried this in RAM but after certain no of docs it gave out of memory
>>> error
>>> so I thought to use FSDirectory method but surprisingly after 70k
>>> documents
>>> it also gave OOM error. I have enough disk space but still I am getting
>>> this
>>> error.I am not sure even for disk based indexing why its giving this
>>> error.
>>> I thought disk based indexing will be slow but atleast it will be
>>> scalable.
>>> Could someone suggest what could be the issue ?
>>>
>>> Thanks
>>> Ajay
>>> --
>>> View this message in context:
>>> http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27755872=
.html
>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>>
>
> --
> View this message in context: http://old.nabble.com/Lucene-Indexing-out-o=
f-memory-tp27755872p27756082.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org