lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: outof memory error
Date Tue, 05 Feb 2008 15:54:43 GMT
See below:

On Feb 5, 2008 9:41 AM, SK R <rsk.sen@gmail.com> wrote:

> Hi,
>   Thanks for your help Erick.
>
>   I changed my code to flush writer before document add which helps to
> reduce memory usage.
>   Also reducing mergefactor and max buffered docs to some level help me to
> avoid this OOM error (eventhough index size is ~1GB).
>
> But please clarify below doubts
>
> Make sure you flush your IndexWriter before attempting to index this
> document.
>
>  - Is it good to call writer.flush() before adding every document into
> writer? Doesn't it affect performance of indexing or search? Whether it's
> also similar to setting MaxBufferDocs=1?
>

No, this is not a good idea. I'd expect this to slow down indexing
significantly.
What I was assuming is that you'd have something like:

if (incoming document is huge) flush index writer

just to free up all the memory you can.


>
>    Also guide me which one is relatively good (take less time & memory)
> among this
>        (i) create 4 indexes each of 250MB and merge them to single index
> file by using writer.addIndexes(..)
>        (ii) create a 1GB index & optimize it?
>

Don't know. You have to measure your particular situation. There's some
discussion
(search the archives) about using several threads to speed up indexing.
Also, there's
the wiki page, see

http://wiki.apache.org/lucene-java/ImproveIndexingSpeed

The first bullet point is important here. Do you really need to improve
indexing speed?
How long does it take and how often to you build it?

But perhaps I mis-read your original post. I *thought* you were talking
about
indexing a 1G *document*. The size of the index shouldn't matter as far as
an OOM error. But now that I re-read your original post, I should have also
suggested that you optimize in different processes than you index since the
implication is that they are separate indexes anyway.

Best
Erick


>
> Thanks & Regards
> RSK
>
>
>
> On Feb 4, 2008 9:23 PM, Erick Erickson <erickerickson@gmail.com> wrote:
>
> > ummmm index smaller documents? <G>
> >
> > You cannot expect to index a 1G doc with 512M of memory in the JVM.
> > The first thing I'd try is upping your JVM memory to the max your
> machine
> > will accept.
> >
> > Make sure you flush your IndexWriter before attempting to index this
> > document.
> >
> > But I would not be surprised if this failed to solve the problem. What's
> > in
> > this massive document? Would it be possible to break it up into
> > smaller segments and index many sub-documents for this massive doc?
> > I also wonder what problem you're trying to solve by indexing this doc.
> > Is it a log file? I can't imagine a text document that big. That's like
> a
> > 100 volume encyclopedia, and I can't help but wonder whether your users
> > would be better served by indexing it in pieces.
> >
> > Best
> > Erick
> >
> > On Feb 4, 2008 10:25 AM, SK R <rsk.sen@gmail.com> wrote:
> >
> > > Hi,
> > >   I got outof memory exception while  indexing  huge documents (~1GB)
> in
> > > one thread and optimizing some other (2 to 3) indexes in different
> > > threads.
> > > Max JVM heap size is 512MB. I'm using lucene2.3.0.
> > >
> > >   Please suggest a way to avoid this exception.
> > >
> > > Regards
> > >  RSK
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message