Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 15523 invoked from network); 19 Oct 2010 16:18:10 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 19 Oct 2010 16:18:10 -0000 Received: (qmail 9641 invoked by uid 500); 19 Oct 2010 16:18:08 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 9600 invoked by uid 500); 19 Oct 2010 16:18:08 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 9593 invoked by uid 99); 19 Oct 2010 16:18:08 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Oct 2010 16:18:08 +0000 X-ASF-Spam-Status: No, hits=2.9 required=10.0 tests=HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [85.25.71.29] (HELO mail.troja.net) (85.25.71.29) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Oct 2010 16:18:01 +0000 Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.troja.net (Postfix) with ESMTP id 50CAED36001 for ; Tue, 19 Oct 2010 18:17:40 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mail.troja.net Received: from mail.troja.net ([127.0.0.1]) by localhost (megaira.troja.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7o31LgtVeMVg for ; Tue, 19 Oct 2010 18:17:28 +0200 (CEST) Received: from VEGA (WDC-MARE.marum.de [134.102.249.74]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mail.troja.net (Postfix) with ESMTPSA id 9874A45F90E for ; Tue, 19 Oct 2010 18:17:28 +0200 (CEST) From: "Uwe Schindler" To: References: <005d01cb6fa6$aed9ba70$0c8d2f50$@thetaphi.de> In-Reply-To: Subject: RE: Analyzer forcing tokenStream and reusableTokenStream to be final Date: Tue, 19 Oct 2010 18:18:20 +0200 Message-ID: <006b01cb6fa9$3f45b440$bdd11cc0$@thetaphi.de> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_006C_01CB6FBA.02CFE3D0" X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQI9lccrLkLEEc2OIyPqjQ74oqcg9AFRoILgAhjE2iIBZGmWNgEI9avTAb1dg8OSJ30fwA== Content-Language: de ------=_NextPart_000_006C_01CB6FBA.02CFE3D0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit By the way, the same tests are done for TokenStream subclasses (whose impls must be final in all cases - its defined as decorator pattern, so we enforce it). And: You don't need to make the class itself final, its enough to make both methods final. ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: uwe@thetaphi.de From: Shai Erera [mailto:serera@gmail.com] Sent: Tuesday, October 19, 2010 6:06 PM To: dev@lucene.apache.org Subject: Re: Analyzer forcing tokenStream and reusableTokenStream to be final I guess you didn't read my email all the way through - I cannot disable assertions for Lucene stuff because I use Lucene's assertions to assert that my indexing code works :). Shai On Tue, Oct 19, 2010 at 5:59 PM, Uwe Schindler wrote: We simply added that to *test* the bundled analyzers for conformance. If you don't like that, you can simply disable assertions for the org.apache.lucene package. ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: uwe@thetaphi.de From: Shai Erera [mailto:serera@gmail.com] Sent: Tuesday, October 19, 2010 5:53 PM To: dev@lucene.apache.org Subject: Re: Analyzer forcing tokenStream and reusableTokenStream to be final I still don't understand how not declaring my tokenStream and reusableTokenStream final can break anything. The methods are there (in my analyzers), and if I risk overriding them somewhere else it's my problem. What am I missing? To add to your email - I too didn't encounter an analyzer that cannot be reused, yet. Shai On Tue, Oct 19, 2010 at 5:45 PM, Robert Muir wrote: On Tue, Oct 19, 2010 at 11:21 AM, Robert Muir wrote: > If someone doesn't override both (e.g. they just override > tokenStream), then it wouldnt actually use their subclasses code. So > then the reflection hack from LUCENE-1678 would force the analyzer to > never re-use, but instead call tokenStream: but this is very bad for > indexing performance! > Here's a jira issue with an example of how the tokenstream/reusableTokenStream confusion makes this a real problem in practice: https://issues.apache.org/jira/browse/LUCENE-2279 --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org ------=_NextPart_000_006C_01CB6FBA.02CFE3D0 Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

By the way, the same tests are done for TokenStream subclasses (whose = impls must be final in all cases – its defined as decorator = pattern, so we enforce it). And: You don’t need to make the class = itself final, its enough to make both methods = final.

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

http://www.thetaphi.de<= /p>

eMail: uwe@thetaphi.de

 

From:= = Shai Erera [mailto:serera@gmail.com]
Sent: Tuesday, October = 19, 2010 6:06 PM
To: dev@lucene.apache.org
Subject: = Re: Analyzer forcing tokenStream and reusableTokenStream to be = final

 

I guess you didn't read my email all the = way through - I cannot disable assertions for Lucene stuff because I use = Lucene's assertions to assert that my indexing code works = :).

Shai

On Tue, Oct 19, = 2010 at 5:59 PM, Uwe Schindler <uwe@thetaphi.de> = wrote:

We simply added = that to *test* the bundled analyzers for conformance. If you = don’t like that, you can simply disable assertions for the = org.apache.lucene package.

 

-----

Uwe = Schindler

H.-H.-Meier-Allee 63, D-28213 = Bremen

http://www.thetaphi.de

eMail: uwe@thetaphi.de

 

From: Shai Erera [mailto:serera@gmail.com] =
Sent: Tuesday, October 19, 2010 5:53 PM
To: dev@lucene.apache.org
Subject: Re: = Analyzer forcing tokenStream and reusableTokenStream to be = final

 <= /o:p>

I still don't = understand how not declaring my tokenStream and reusableTokenStream = final can break anything. The methods are there (in my analyzers), and = if I risk overriding them somewhere else it's my problem.

What am = I missing?

To add to your email - I too didn't encounter an = analyzer that cannot be reused, yet.

Shai

On Tue, Oct = 19, 2010 at 5:45 PM, Robert Muir <rcmuir@gmail.com> wrote:

On Tue, Oct 19, = 2010 at 11:21 AM, Robert Muir <rcmuir@gmail.com> wrote:
> If someone = doesn't override both (e.g. they just override
> tokenStream), = then it wouldnt actually use their subclasses code. So
> then the = reflection hack from LUCENE-1678 would force the analyzer to
> = never re-use, but instead call tokenStream: but this is very bad = for
> indexing performance!
>

Here's a = jira issue with an example of how the
tokenstream/reusableTokenStream = confusion makes this a real problem in
practice:

https://issues.apache.org/jira/browse/LUCENE-2279


--------------= -------------------------------------------------------
To = unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For = additional commands, e-mail: dev-help@lucene.apache.org

 <= /o:p>

 

------=_NextPart_000_006C_01CB6FBA.02CFE3D0--