lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Trejkaz <trej...@trypticon.org>
Subject Re: ArrayIndexOutOfBoundsException: -65536
Date Tue, 14 Oct 2014 05:29:32 GMT
Bit of thread necromancy here, but I figured it was relevant because
we get exactly the same error.

On Thu, Jan 19, 2012 at 12:47 AM, Michael McCandless
<lucene@mikemccandless.com> wrote:
> Hmm, are you certain your RAM buffer is 3 MB?
>
> Is it possible you are indexing an absurdly enormous document...?

We're seeing a case here where the document certainly could qualify as
"absurdly enormous". The doc itself is 2GB in size and the
tokenisation is per-character, not per-word, so the number of
generated terms must be enormous. Probably enough to fill 2GB...

So I'm wondering if there is more info somewhere on why this is (or
was? We're still using 3.6.x) a limit and whether it can be detected
up-front. Some large amount of indexing time (~30 minutes) could be
avoided if we can detect that it would have failed ahead of time.

TX

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message