lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Lucene Indexing out of memory
Date Tue, 02 Mar 2010 14:00:54 GMT
It's not searching that I'm wondering about. The memory size, as
far as I understand, really only has document resolution. That is, you
can't index a part of a document, flush to disk, then index the rest of
the document. The entire document is parsed into memory, and only
then flushed to disk if RAMBuffer size is exceeded.

Which is why I'm wondering if your documents get absurdly huge as
you index more and more of them.....

Erick

On Tue, Mar 2, 2010 at 8:48 AM, ajay_gupta <ajay978@gmail.com> wrote:

>
> Hi Erick,
> I tried setting setRAMBufferSizeMB  as 200-500MB as well but still it goes
> OOM error.
> I thought its filebased indexing so memory shouldn't be an issue but you
> might be right that when searching it might be using lot of memory ? Is
> there way to load documents in chunks or someothere way to make it scalable
> ?
>
> Thanks in advance
> Ajay
>
>
> Erick Erickson wrote:
> >
> > I'm not following this entirely, but these docs may be huge by the
> > time you add context for every word in them. You say that you
> > "search the existing indices then I get the content and append....".
> > So is it possible that after 70K documents your additions become
> > so huge that you're blowing up? Have you taken any measurements
> > to determine how big the docs get as you index more and more
> > of them?
> >
> > If the above is off base, have you tried setting
> > IndexWriter.setRAMBufferSizeMB?
> >
> > HTH
> > Erick
> >
> > On Tue, Mar 2, 2010 at 8:27 AM, ajay_gupta <ajay978@gmail.com> wrote:
> >
> >>
> >> Hi,
> >> It might be general question though but I couldn't find the answer yet.
> I
> >> have around 90k documents sizing around 350 MB. Each document contains a
> >> record which has some text content. For each word in this text I want to
> >> store context for that word and index it so I am reading each document
> >> and
> >> for each word in that document I am appending fixed number of
> surrounding
> >> words. To do that first I search in existing indices if this word
> already
> >> exist and if it is then I get the content and append the new context and
> >> update the document. In case no context exist I create a document with
> >> fields "word" and "context" and add these two fields with values as word
> >> value and context value.
> >>
> >> I tried this in RAM but after certain no of docs it gave out of memory
> >> error
> >> so I thought to use FSDirectory method but surprisingly after 70k
> >> documents
> >> it also gave OOM error. I have enough disk space but still I am getting
> >> this
> >> error.I am not sure even for disk based indexing why its giving this
> >> error.
> >> I thought disk based indexing will be slow but atleast it will be
> >> scalable.
> >> Could someone suggest what could be the issue ?
> >>
> >> Thanks
> >> Ajay
> >> --
> >> View this message in context:
> >>
> http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27755872.html
> >> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27756082.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message