lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: IndexWriter memory leak?
Date Thu, 08 Apr 2010 09:32:00 GMT
If we could change the Flex file so that yyreset(Reader) would check the
size of zzBuffer, we could trim it when it gets too big. But I don't think
we have such control when writing the flex syntax ... yyreset is generated
by JFlex and that's the only place I can think of to trim the buffer down
when it exceeds a predefined threshold ....

Maybe what we can do is create our own method which will be called by
StandardTokenizer after yyreset is called, something like
trimBufferIfTooBig(int threshold) which will reallocate zzBuffer if it
exceeded the threshold. We can decide on a reasonable 64K threshold or
something, or simply always cut back to 16 KB. As far as I understand, that
buffer should never grow that much. I.e. in zzRefill, which is the only
place where the buffer gets resized, there is an attempt to first move back
characters that were already consumed and only then allocate a bigger
buffer. Which means only if there is a token whose size is larger than 16KB
(!?), will this buffer get expanded.

A trimBuffer method might not be that bad .. as a protective measure. What
do you think? Of course, JFlex can fix it on their own ... but until that
happens ...

Shai

On Thu, Apr 8, 2010 at 10:35 AM, Uwe Schindler <uwe@thetaphi.de> wrote:

> > I would like to identify also the problematic document I have 10000 so,
> > what
> > would be the best way of identifying the one that it making zzBuffer to
> > grow
> > without control?
>
> Dont index your documents, but instead pass them directly to the analyzer
> and consume the tokenstream manually. Then visit TermAttribute.termLength()
> for each Token.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message