lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Lucene Indexing out of memory
Date Wed, 03 Mar 2010 16:00:32 GMT
The worst case RAM usage for Lucene is a single doc with many unique
terms.  Lucene allocates ~60 bytes per unique term (plus space to hold
that term's characters = 2 bytes per char).  And, Lucene cannot flush
within one document -- it must flush after the doc has been fully
indexed.

This past thread (also from Paul) delves into some of the details:

  http://lucene.markmail.org/thread/pbeidtepentm6mdn

But it's not clear whether that is the issue affecting Ajay -- I think
more details about the docs, or, some code fragments, could help shed
light.

Mike

On Tue, Mar 2, 2010 at 8:47 AM, Murdoch, Paul <PAUL.B.MURDOCH@saic.com> wrote:
> Ajay,
>
> Here is another thread I started on the same issue.
>
> http://stackoverflow.com/questions/1362460/why-does-lucene-cause-oom-whe
> n-indexing-large-files
>
> Paul
>
>
> -----Original Message-----
> From: java-user-return-45254-PAUL.B.MURDOCH=saic.com@lucene.apache.org
> [mailto:java-user-return-45254-PAUL.B.MURDOCH=saic.com@lucene.apache.org
> ] On Behalf Of ajay_gupta
> Sent: Tuesday, March 02, 2010 8:28 AM
> To: java-user@lucene.apache.org
> Subject: Lucene Indexing out of memory
>
>
> Hi,
> It might be general question though but I couldn't find the answer yet.
> I
> have around 90k documents sizing around 350 MB. Each document contains a
> record which has some text content. For each word in this text I want to
> store context for that word and index it so I am reading each document
> and
> for each word in that document I am appending fixed number of
> surrounding
> words. To do that first I search in existing indices if this word
> already
> exist and if it is then I get the content and append the new context and
> update the document. In case no context exist I create a document with
> fields "word" and "context" and add these two fields with values as word
> value and context value.
>
> I tried this in RAM but after certain no of docs it gave out of memory
> error
> so I thought to use FSDirectory method but surprisingly after 70k
> documents
> it also gave OOM error. I have enough disk space but still I am getting
> this
> error.I am not sure even for disk based indexing why its giving this
> error.
> I thought disk based indexing will be slow but atleast it will be
> scalable.
> Could someone suggest what could be the issue ?
>
> Thanks
> Ajay
> --
> View this message in context:
> http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27755872.
> html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message