lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject RE: OutOfMemoryError on addIndexes()
Date Fri, 12 Aug 2005 18:26:43 GMT

Okay, just for the record, I'm currently on vacation, and i don't have
access to any of my indexes at work in order to make a comparison, but the
number of unique terms in your index (which is i'm 99% sure what
indexEnum.size represents in the code you cited) seems HUGE!!!

You havne't given us a lot of details about what your index contains (ie:
the nature of the documents) .. in fact, for the number of terms you cite
(811806819) the only info we have is that the index containing that number
of terms is 29MB in size -- no idea how many documents are in that index.
But if we look at your previous email, you mentioend having a nother index
that cuases the same problem which is 120MB, which you built from 11359
files.  If we assume that index has no more then the same number of unique
terms indexed (which seems unlikely, but lets give it the benefit of the
doubt, and assume the added size is all stored fields) and assume that you
made one document per file, and that those files are 100% unique from each
other, and contain no terms in common -- that means that each file
contains roughtly 71,500 unique terms.

that seems like a lot.

A quick google search tells me that the english language contains
somewhere from 500,000 to 1,000,000 words - your index has 800 times that
many terms.  even assuming you index a lot of numerical or date based data
-- that seems like a lot.

I have to wonder if maybe you are indexing a lot of junk information by
mistake - perhaps some binary data is mistakenly getting treated as
strings?

can you tell us more about the nature of your indexes?


: Date: Fri, 12 Aug 2005 09:45:40 +0200
: From: Trezzi Michael <MTrezzi@CSAS.CZ>
: Reply-To: java-user@lucene.apache.org
: To: java-user@lucene.apache.org
: Subject: RE: OutOfMemoryError on addIndexes()
:
: I did some more research and these are the results.
:  The OutOfMemory occurs on line 82 of class TermInfosReader.java. That
: and two other lines are trying to create an array of the size that is
: get by the
L
: int indexSize = (int)indexEnum.size;
:
: For my 29MB index this indexSize integer is 811806819. So if creating 3
: arrays of this size (line 82-84) it requires and enormous amount of
: memory, e.g. a model situation...a char array...char has 2 bytes *
: 811806819 => 1584MB. That seems to be a little much and objects created
: in those arrays Term, TermInfo and long are definately not simple chars.
: This way I would need several gigabytes of memory to merge several even
: small (30MB) indexes. Is this the standard way how it works, or is there
: a problem on my side?
:
: Thanks,
:
: Michael
:
: ________________________________
:
: Od: Ian Lea [mailto:ian.lea@gmail.com]
: Odesláno: st 10.8.2005 12:34
: Komu: java-user@lucene.apache.org
: P?edm?t: Re: OutOfMemoryError on addIndexes()
:
:
:
: How much memory are you giving your programs?
:
:  java    -Xmx<size>        set maximum Java heap size
:
: --
: Ian.
:
: On 10/08/05, Trezzi Michael <MTrezzi@csas.cz> wrote:
: > Hello,
: > I have a problem and i tried everything i could think of to solve it. TO understand
my situation, i create indexes on several computers on our network and they are copied to
one server. There, once a day, they are merged into one masterIndex, which is then searched.
The problem is in merging. I use the following code:
: >
: > Directory[] ar = new Directory[fileList.length];
: >        for(int i=0; i<fileList.length;i++) {
: >            ar[i] = FSDirectory.getDirectory(fileList[i], false);
: >        }
: >        writer.addIndexes(ar);
: >        for(int i=0; i<fileList.length;i++) {
: >            ar[i].close();
: >        }
: >       writer.optimize();
: >       writer.close();
: >
: > I also tried a longer way of opening every index separately and adding it document
by document. The problem is i am getting OutOfMemory errors on this. When I use the per document
way, it happens on the IndexReader.open command and only on indexes of approx 100M+ (The largest
index I have is only about 150MB) When I run it on windows machine with JDK1.5 I get the following:
: >     Exception in thread "main" java.lang.OutOfMemoryError: Requested array size exceeds
VM limit
: > On Linux I am running 1.4 and I get the message without the Array size information.
: >
: > I did try it also on test index that was made from 11359 files  (1,59GB) that had 120MB
and I got this error too. In my opinion 120MB index is not that big. The machine it runs on
is a Xeon 3,2GHz with 2GB of RAM, so it should be enough. Can you please help me?
: >
: > Thank you in advance,
: >
: > Michael Trezzi
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: For additional commands, e-mail: java-user-help@lucene.apache.org
:
:
:
:
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message