Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@lucene.apache.org
Received-SPF: pass (nike.apache.org: domain of serera@gmail.com designates
 74.125.82.176 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=h6HoQ7bZeE85iRg3ZK9A9SbFaD6PsxjudPO/GoAORcRbIHNuWtIN/Ir1KMoF3QDbk7
         1QRSd+rcE/BPFT5tt+j+vVbl/Tex3mR4xgiGOM66V7Ub8KfqZwfrvTJuUquqiPQl9/Hn
         w18XtSohn5x8mbsKulljnMY1BY6vEh7az8w9w=
MIME-Version: 1.0
In-Reply-To: <AANLkTikqwnT+Vyc+L+CeV4a6-iTuK8PTK7P5VgNzq9d6@mail.gmail.com>
References: <AANLkTi=bRy9Bnz5ZxNuZPWUigFd9Y+bdkt7Dfe8eg-gy@mail.gmail.com>
	<AANLkTi=cEGnDAE5C7MqjB2Pt7UFVdAfKHx3mTCJXGkbO@mail.gmail.com>
	<AANLkTimXYPRLgD9uK6k0nL-sPic_=jEAcMENe=zaMk1W@mail.gmail.com>
	<AANLkTi=iQL7GT0vHsMGU4RSWA5jkiH8j4O+36DAxLz3F@mail.gmail.com>
	<AANLkTikqwnT+Vyc+L+CeV4a6-iTuK8PTK7P5VgNzq9d6@mail.gmail.com>
Date: Tue, 19 Oct 2010 18:11:11 +0200
Message-ID: <AANLkTi=awXGioAKAhi=UHOXiyFm=hF6Ap=CVdDxrWzTs@mail.gmail.com>
Subject: Re: Analyzer forcing tokenStream and reusableTokenStream to be final
From: Shai Erera <serera@gmail.com>
To: dev@lucene.apache.org
Content-Type: multipart/alternative; boundary=00163646bbb89bac7e0492fa8aec

--00163646bbb89bac7e0492fa8aec
Content-Type: text/plain; charset=ISO-8859-1

My only problem is that w/o disabling asserts, I cannot bypass these checks.
Hence why I hoped we can limit the check itself to o.a.l code. For someone
who knows what he's doing, we don't allow him to inherit analyzers and
override these methods, yet we protect those who don't know what they're
doing.

It's frustrating :).

I'm all for removing one of them and declare the other one reusable and be
done with it, but I have a feeling this is a matter for a larger discussion.
What I'm asking here is for something much simpler - we don't jeopardize
Lucene code, and we document the risks of not overriding
reusableTokenStream.

Can't we change the assertion to not fail if the class declares
reusableTokenStream, yet nothing is final? Wouldn't that avoid the issues
you've mentioned?

Shai

On Tue, Oct 19, 2010 at 5:59 PM, Robert Muir <rcmuir@gmail.com> wrote:

> On Tue, Oct 19, 2010 at 11:52 AM, Shai Erera <serera@gmail.com> wrote:
> > I still don't understand how not declaring my tokenStream and
> > reusableTokenStream final can break anything. The methods are there (in
> my
> > analyzers), and if I risk overriding them somewhere else it's my problem.
> >
>
> Well it is your problem, but we created it with our confusing APIs :)
>
> So if you subclass your analyzer but only implement tokenStream and
> not also reusableTokenStream, you get very terrible performance like
> https://issues.apache.org/jira/browse/LUCENE-2279
>
> By enforcing these to be final we prevent the trap where you subclass
> and don't implement reusableTokenStream and get bad performance, but
> its still not completely solved.
> There is still the trap (especially with the attributes-based API,
> even more overhead), that you just implement an Analyzer with only
> tokenStream and get bad performance.
>
> If we only had one of these methods, lets say called "tokenStream",
> and it was reusable, we could remove these final checks completely.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

--00163646bbb89bac7e0492fa8aec
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">My only problem is that w/o disabling asserts, I cannot by=
pass these checks. Hence why I hoped we can limit the check itself to o.a.l=
 code. For someone who knows what he&#39;s doing, we don&#39;t allow him to=
 inherit analyzers and override these methods, yet we protect those who don=
&#39;t know what they&#39;re doing.<br>
<br>It&#39;s frustrating :).<br><br>I&#39;m all for removing one of them an=
d declare the other one reusable and be done with it, but I have a feeling =
this is a matter for a larger discussion. What I&#39;m asking here is for s=
omething much simpler - we don&#39;t jeopardize Lucene code, and we documen=
t the risks of not overriding reusableTokenStream.<br>
<br>Can&#39;t we change the assertion to not fail if the class declares reu=
sableTokenStream, yet nothing is final? Wouldn&#39;t that avoid the issues =
you&#39;ve mentioned?<br><br>Shai<br><br><div class=3D"gmail_quote">On Tue,=
 Oct 19, 2010 at 5:59 PM, Robert Muir <span dir=3D"ltr">&lt;<a href=3D"mail=
to:rcmuir@gmail.com">rcmuir@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; borde=
r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div class=3D"im"=
>On Tue, Oct 19, 2010 at 11:52 AM, Shai Erera &lt;<a href=3D"mailto:serera@=
gmail.com">serera@gmail.com</a>&gt; wrote:<br>

&gt; I still don&#39;t understand how not declaring my tokenStream and<br>
&gt; reusableTokenStream final can break anything. The methods are there (i=
n my<br>
&gt; analyzers), and if I risk overriding them somewhere else it&#39;s my p=
roblem.<br>
&gt;<br>
<br>
</div>Well it is your problem, but we created it with our confusing APIs :)=
<br>
<br>
So if you subclass your analyzer but only implement tokenStream and<br>
not also reusableTokenStream, you get very terrible performance like<br>
<div class=3D"im"><a href=3D"https://issues.apache.org/jira/browse/LUCENE-2=
279" target=3D"_blank">https://issues.apache.org/jira/browse/LUCENE-2279</a=
><br>
<br>
</div>By enforcing these to be final we prevent the trap where you subclass=
<br>
and don&#39;t implement reusableTokenStream and get bad performance, but<br=
>
its still not completely solved.<br>
There is still the trap (especially with the attributes-based API,<br>
even more overhead), that you just implement an Analyzer with only<br>
tokenStream and get bad performance.<br>
<br>
If we only had one of these methods, lets say called &quot;tokenStream&quot=
;,<br>
and it was reusable, we could remove these final checks completely.<br>
<div><div></div><div class=3D"h5"><br>
---------------------------------------------------------------------<br>
To unsubscribe, e-mail: <a href=3D"mailto:dev-unsubscribe@lucene.apache.org=
">dev-unsubscribe@lucene.apache.org</a><br>
For additional commands, e-mail: <a href=3D"mailto:dev-help@lucene.apache.o=
rg">dev-help@lucene.apache.org</a><br>
<br>
</div></div></blockquote></div><br></div>

--00163646bbb89bac7e0492fa8aec--