Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 12116 invoked from network); 19 Oct 2010 16:11:46 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 19 Oct 2010 16:11:46 -0000 Received: (qmail 89805 invoked by uid 500); 19 Oct 2010 16:11:45 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 89685 invoked by uid 500); 19 Oct 2010 16:11:44 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 89678 invoked by uid 99); 19 Oct 2010 16:11:44 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Oct 2010 16:11:44 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of serera@gmail.com designates 74.125.82.176 as permitted sender) Received: from [74.125.82.176] (HELO mail-wy0-f176.google.com) (74.125.82.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Oct 2010 16:11:37 +0000 Received: by wyg36 with SMTP id 36so1455111wyg.35 for ; Tue, 19 Oct 2010 09:11:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=6JeB+JCb6M8OcEvpRATyMwWGdf4OrRLHF3PU9W0DoAI=; b=Tu6aBcSjg6fOyBksaBQCnziUogMRErzbGofJwlO1Xc5D6ppRd3jrebyJk0lSWQzkZg SIsTwFkFmzY5oxeRbciXKOHUcjjmxsUEcwfMJvIUtLf0PBiGTmLWXKJ16+M0wrbdKcp3 XVi5XNHTOaqEyxakoyepGBe/buHIH/m4iyDJg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=h6HoQ7bZeE85iRg3ZK9A9SbFaD6PsxjudPO/GoAORcRbIHNuWtIN/Ir1KMoF3QDbk7 1QRSd+rcE/BPFT5tt+j+vVbl/Tex3mR4xgiGOM66V7Ub8KfqZwfrvTJuUquqiPQl9/Hn w18XtSohn5x8mbsKulljnMY1BY6vEh7az8w9w= MIME-Version: 1.0 Received: by 10.227.146.213 with SMTP id i21mr6468654wbv.99.1287504671779; Tue, 19 Oct 2010 09:11:11 -0700 (PDT) Received: by 10.216.178.146 with HTTP; Tue, 19 Oct 2010 09:11:11 -0700 (PDT) In-Reply-To: References: Date: Tue, 19 Oct 2010 18:11:11 +0200 Message-ID: Subject: Re: Analyzer forcing tokenStream and reusableTokenStream to be final From: Shai Erera To: dev@lucene.apache.org Content-Type: multipart/alternative; boundary=00163646bbb89bac7e0492fa8aec X-Virus-Checked: Checked by ClamAV on apache.org --00163646bbb89bac7e0492fa8aec Content-Type: text/plain; charset=ISO-8859-1 My only problem is that w/o disabling asserts, I cannot bypass these checks. Hence why I hoped we can limit the check itself to o.a.l code. For someone who knows what he's doing, we don't allow him to inherit analyzers and override these methods, yet we protect those who don't know what they're doing. It's frustrating :). I'm all for removing one of them and declare the other one reusable and be done with it, but I have a feeling this is a matter for a larger discussion. What I'm asking here is for something much simpler - we don't jeopardize Lucene code, and we document the risks of not overriding reusableTokenStream. Can't we change the assertion to not fail if the class declares reusableTokenStream, yet nothing is final? Wouldn't that avoid the issues you've mentioned? Shai On Tue, Oct 19, 2010 at 5:59 PM, Robert Muir wrote: > On Tue, Oct 19, 2010 at 11:52 AM, Shai Erera wrote: > > I still don't understand how not declaring my tokenStream and > > reusableTokenStream final can break anything. The methods are there (in > my > > analyzers), and if I risk overriding them somewhere else it's my problem. > > > > Well it is your problem, but we created it with our confusing APIs :) > > So if you subclass your analyzer but only implement tokenStream and > not also reusableTokenStream, you get very terrible performance like > https://issues.apache.org/jira/browse/LUCENE-2279 > > By enforcing these to be final we prevent the trap where you subclass > and don't implement reusableTokenStream and get bad performance, but > its still not completely solved. > There is still the trap (especially with the attributes-based API, > even more overhead), that you just implement an Analyzer with only > tokenStream and get bad performance. > > If we only had one of these methods, lets say called "tokenStream", > and it was reusable, we could remove these final checks completely. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: dev-help@lucene.apache.org > > --00163646bbb89bac7e0492fa8aec Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
My only problem is that w/o disabling asserts, I cannot by= pass these checks. Hence why I hoped we can limit the check itself to o.a.l= code. For someone who knows what he's doing, we don't allow him to= inherit analyzers and override these methods, yet we protect those who don= 't know what they're doing.

It's frustrating :).

I'm all for removing one of them an= d declare the other one reusable and be done with it, but I have a feeling = this is a matter for a larger discussion. What I'm asking here is for s= omething much simpler - we don't jeopardize Lucene code, and we documen= t the risks of not overriding reusableTokenStream.

Can't we change the assertion to not fail if the class declares reu= sableTokenStream, yet nothing is final? Wouldn't that avoid the issues = you've mentioned?

Shai

On Tue,= Oct 19, 2010 at 5:59 PM, Robert Muir <rcmuir@gmail.com> wrote:
On Tue, Oct 19, 2010 at 11:52 AM, Shai Erera <serera@gmail.com> wrote:
> I still don't understand how not declaring my tokenStream and
> reusableTokenStream final can break anything. The methods are there (i= n my
> analyzers), and if I risk overriding them somewhere else it's my p= roblem.
>

Well it is your problem, but we created it with our confusing APIs :)=

So if you subclass your analyzer but only implement tokenStream and
not also reusableTokenStream, you get very terrible performance like
By enforcing these to be final we prevent the trap where you subclass=
and don't implement reusableTokenStream and get bad performance, but its still not completely solved.
There is still the trap (especially with the attributes-based API,
even more overhead), that you just implement an Analyzer with only
tokenStream and get bad performance.

If we only had one of these methods, lets say called "tokenStream"= ;,
and it was reusable, we could remove these final checks completely.

--00163646bbb89bac7e0492fa8aec--