lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: Analyzer forcing tokenStream and reusableTokenStream to be final
Date Tue, 19 Oct 2010 17:05:48 GMT
Thanks Uwe - that's what I was aiming for. We let the external analyzers
make sure they are safe by themselves, while ensuring Lucene/Solr ones are
good. +1 from me to commit this :).

Shai

On Tue, Oct 19, 2010 at 7:03 PM, Uwe Schindler <uwe@thetaphi.de> wrote:

> About the whole assertion (as it also affects TokenStreams). We want to
> make sure that all Lucene/Solr TokenStreams and Analyzers are final or have
> final implementation (even when we remove the reuseable method).
>
>
>
> The idea is to simply only hit this assert for classes from the
> org.apache.lucene package prefix! So we can test Lucene code, but for all
> other subclasses we simply ignore. The method assertFinal can do this for
> us:
>
>
>
> Index: Analyzer.java
>
> ===================================================================
>
> --- Analyzer.java   (revision 1023877)
>
> +++ Analyzer.java   (working copy)
>
> @@ -48,6 +48,8 @@
>
>    private boolean assertFinal() {
>
>      try {
>
>        final Class<?> clazz = getClass();
>
> +      if (!clazz.getName().startsWith("org.apache.lucene.")
>
> +        return true;
>
>        assert clazz.isAnonymousClass() ||
>
>          (clazz.getModifiers() & (Modifier.FINAL | Modifier.PRIVATE)) != 0
> ||
>
>          (
>
>
>
> Same for TokenStream. This is no performance problem, as assertFinal is
> only called when asserts are enabled (trick is “assert assertFinal();” in
> ctor).
>
>
>
> Uwe
>
>
>
> -----
>
> Uwe Schindler
>
> H.-H.-Meier-Allee 63, D-28213 Bremen
>
> http://www.thetaphi.de
>
> eMail: uwe@thetaphi.de
>
>
>
> *From:* Uwe Schindler [mailto:uwe@thetaphi.de]
> *Sent:* Tuesday, October 19, 2010 6:18 PM
>
> *To:* dev@lucene.apache.org
> *Subject:* RE: Analyzer forcing tokenStream and reusableTokenStream to be
> final
>
>
>
> By the way, the same tests are done for TokenStream subclasses (whose impls
> must be final in all cases – its defined as decorator pattern, so we enforce
> it). And: You don’t need to make the class itself final, its enough to make
> both methods final.
>
>
>
> -----
>
> Uwe Schindler
>
> H.-H.-Meier-Allee 63, D-28213 Bremen
>
> http://www.thetaphi.de
>
> eMail: uwe@thetaphi.de
>
>
>
> *From:* Shai Erera [mailto:serera@gmail.com]
> *Sent:* Tuesday, October 19, 2010 6:06 PM
> *To:* dev@lucene.apache.org
> *Subject:* Re: Analyzer forcing tokenStream and reusableTokenStream to be
> final
>
>
>
> I guess you didn't read my email all the way through - I cannot disable
> assertions for Lucene stuff because I use Lucene's assertions to assert that
> my indexing code works :).
>
> Shai
>
> On Tue, Oct 19, 2010 at 5:59 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
>
> We simply added that to **test** the bundled analyzers for conformance. If
> you don’t like that, you can simply disable assertions for the
> org.apache.lucene package.
>
>
>
> -----
>
> Uwe Schindler
>
> H.-H.-Meier-Allee 63, D-28213 Bremen
>
> http://www.thetaphi.de
>
> eMail: uwe@thetaphi.de
>
>
>
> *From:* Shai Erera [mailto:serera@gmail.com]
> *Sent:* Tuesday, October 19, 2010 5:53 PM
> *To:* dev@lucene.apache.org
> *Subject:* Re: Analyzer forcing tokenStream and reusableTokenStream to be
> final
>
>
>
> I still don't understand how not declaring my tokenStream and
> reusableTokenStream final can break anything. The methods are there (in my
> analyzers), and if I risk overriding them somewhere else it's my problem.
>
> What am I missing?
>
> To add to your email - I too didn't encounter an analyzer that cannot be
> reused, yet.
>
> Shai
>
> On Tue, Oct 19, 2010 at 5:45 PM, Robert Muir <rcmuir@gmail.com> wrote:
>
> On Tue, Oct 19, 2010 at 11:21 AM, Robert Muir <rcmuir@gmail.com> wrote:
> > If someone doesn't override both (e.g. they just override
> > tokenStream), then it wouldnt actually use their subclasses code. So
> > then the reflection hack from LUCENE-1678 would force the analyzer to
> > never re-use, but instead call tokenStream: but this is very bad for
> > indexing performance!
> >
>
> Here's a jira issue with an example of how the
> tokenstream/reusableTokenStream confusion makes this a real problem in
> practice:
>
> https://issues.apache.org/jira/browse/LUCENE-2279
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>
>
>
>

Mime
View raw message