lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Lucene Indexing out of memory
Date Tue, 02 Mar 2010 13:39:19 GMT
I'm not following this entirely, but these docs may be huge by the
time you add context for every word in them. You say that you
"search the existing indices then I get the content and append....".
So is it possible that after 70K documents your additions become
so huge that you're blowing up? Have you taken any measurements
to determine how big the docs get as you index more and more
of them?

If the above is off base, have you tried setting
IndexWriter.setRAMBufferSizeMB?

HTH
Erick

On Tue, Mar 2, 2010 at 8:27 AM, ajay_gupta <ajay978@gmail.com> wrote:

>
> Hi,
> It might be general question though but I couldn't find the answer yet. I
> have around 90k documents sizing around 350 MB. Each document contains a
> record which has some text content. For each word in this text I want to
> store context for that word and index it so I am reading each document and
> for each word in that document I am appending fixed number of surrounding
> words. To do that first I search in existing indices if this word already
> exist and if it is then I get the content and append the new context and
> update the document. In case no context exist I create a document with
> fields "word" and "context" and add these two fields with values as word
> value and context value.
>
> I tried this in RAM but after certain no of docs it gave out of memory
> error
> so I thought to use FSDirectory method but surprisingly after 70k documents
> it also gave OOM error. I have enough disk space but still I am getting
> this
> error.I am not sure even for disk based indexing why its giving this error.
> I thought disk based indexing will be slow but atleast it will be scalable.
> Could someone suggest what could be the issue ?
>
> Thanks
> Ajay
> --
> View this message in context:
> http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27755872.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message