Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
From: "Uwe Schindler" <uwe@thetaphi.de>
To: <java-user@lucene.apache.org>
References: <t2mf8d56a4d1004071335m90081304iafea6f5ed1cf8ce@mail.gmail.com>
	 <i2tf8d56a4d1004071523sb3f4d042j830bf9300b33e867@mail.gmail.com>
	 <j2o786fde51004072341zb527be14l1ba0e3678bb3198f@mail.gmail.com>
	 <t2wf8d56a4d1004080023sbb0cbdl2f141a7fe950d835@mail.gmail.com>
	 <007e01cad6ee$10c8dc40$325a94c0$@de>
	 <l2s786fde51004080232qf4dc3564ifdcecd9d739f0136@mail.gmail.com>
	 <x2zf8d56a4d1004080251rf223c873gf99f5804b2f02a88@mail.gmail.com>
 <t2q786fde51004080259qbc5501bbn5c0faf6a3bb5dd7d@mail.gmail.com>
In-Reply-To: <t2q786fde51004080259qbc5501bbn5c0faf6a3bb5dd7d@mail.gmail.com>
Subject: RE: IndexWriter memory leak?
Date: Thu, 8 Apr 2010 12:04:02 +0200
Message-ID: <00c301cad702$d10a4ca0$731ee5e0$@de>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Thread-Index: AcrXAlF0Bum3sfJ0Q/ymD5gZdZj4MwAAHOaA
Content-Language: de

I already answered, that I will take care of this!

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Shai Erera [mailto:serera@gmail.com]
> Sent: Thursday, April 08, 2010 12:00 PM
> To: java-user@lucene.apache.org
> Subject: Re: IndexWriter memory leak?
>=20
> Yes, that's the trimBuffer version I was thinking about, only this guy
> created a reset(Reader, int) and does both ops (resetting + trim) in
> one
> method call. More convenient. Can you please open an issue to track
> that?
> People will have a chance to comment on whether we (Lucene) should
> handle
> that, or it should be a JFlex fix. Based on the number of replies this
> guy
> received (0 !), I doubt JFlex would consider it a problem. But we can
> do
> some small service to our users base by protecting against such
> problems.
>=20
> And while you're opening the issue, if you want to take a stab at
> fixing it
> and post a patch, it'd be great :).
>=20
> Shai
>=20
> On Thu, Apr 8, 2010 at 12:51 PM, Ruben Laguna
> <ruben.laguna@gmail.com>wrote:
>=20
> > I was investigating this a little further and in the JFlex mailing
> list I
> > found [1]
> >
> > I don't know much about flex / JFlex but it seems that this guy
> resets the
> > zzBuffer to 16384 or less when setting the input for the lexer
> >
> >
> > Quoted from  shef <shef31@ya...>
> >
> >
> > I set
> >
> > %buffer 0
> >
> > in the options section, and then added this method to the lexer:
> >
> >    /**
> >     * Set the input for the lexer. The size parameter really speeds
> things
> > up,
> >     * because by default, the lexer allocates an internal buffer of
> 16k.
> > For
> >     * most strings, this is unnecessarily large. If the size param =
is
> > 0 or greater
> >     * than 16k, then the buffer is set to 16k. If the size param is
> > smaller, then
> >     * the buf will be set to the exact size.
> >     * @param r the reader that provides the data
> >     * @param the size of the data in the reader.
> >     */
> >    public void reset(Reader r, int size) {
> >        if (size =3D=3D 0 || size > 16384)
> >            size =3D 16384;
> >        zzBuffer =3D new char[size];
> >        yyreset(r);
> >    }
> >
> >
> > So maybe there is a way to trim the zzBuffer this way (?).
> >
> > BTW, I will try to find out which is the "big token" in my dataset
> this
> > afternoon. Thanks for the help.
> >
> > I actually workaround this memory problem for the time being by
> wrapping
> > the
> > IndexWriter in a class that periodically closes the IndexWriter and
> creates
> > a new one, allowing the old to be GCed, but I would be really good =
if
> > either
> > JFlex or Lucene can take care of this zzBuffer going berserk.
> >
> >
> > Again thanks for the quick response. /Rub=C3=A9n
> >
> >
> > [1]
> >
> >
> =
https://sourceforge.net/mailarchive/message.php?msg_id=3D444070.38422.qm@=

> web38901.mail.mud.yahoo.com
> >
> > On Thu, Apr 8, 2010 at 11:32 AM, Shai Erera <serera@gmail.com> =
wrote:
> >
> > > If we could change the Flex file so that yyreset(Reader) would
> check the
> > > size of zzBuffer, we could trim it when it gets too big. But I
> don't
> > think
> > > we have such control when writing the flex syntax ... yyreset is
> > generated
> > > by JFlex and that's the only place I can think of to trim the
> buffer down
> > > when it exceeds a predefined threshold ....
> > >
> > > Maybe what we can do is create our own method which will be called
> by
> > > StandardTokenizer after yyreset is called, something like
> > > trimBufferIfTooBig(int threshold) which will reallocate zzBuffer =
if
> it
> > > exceeded the threshold. We can decide on a reasonable 64K =
threshold
> or
> > > something, or simply always cut back to 16 KB. As far as I
> understand,
> > that
> > > buffer should never grow that much. I.e. in zzRefill, which is the
> only
> > > place where the buffer gets resized, there is an attempt to first
> move
> > back
> > > characters that were already consumed and only then allocate a
> bigger
> > > buffer. Which means only if there is a token whose size is larger
> than
> > 16KB
> > > (!?), will this buffer get expanded.
> > >
> > > A trimBuffer method might not be that bad .. as a protective
> measure.
> > What
> > > do you think? Of course, JFlex can fix it on their own ... but
> until that
> > > happens ...
> > >
> > > Shai
> > >
> > > On Thu, Apr 8, 2010 at 10:35 AM, Uwe Schindler <uwe@thetaphi.de>
> wrote:
> > >
> > > > > I would like to identify also the problematic document I have
> 10000
> > so,
> > > > > what
> > > > > would be the best way of identifying the one that it making
> zzBuffer
> > to
> > > > > grow
> > > > > without control?
> > > >
> > > > Dont index your documents, but instead pass them directly to the
> > analyzer
> > > > and consume the tokenstream manually. Then visit
> > > TermAttribute.termLength()
> > > > for each Token.
> > > >
> > > >
> > > > =
-----------------------------------------------------------------
> ----
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: =
java-user-help@lucene.apache.org
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > /Rub=C3=A9n
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org