Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B352F9B22 for ; Tue, 10 Apr 2012 19:01:55 +0000 (UTC) Received: (qmail 82636 invoked by uid 500); 10 Apr 2012 19:01:54 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 82571 invoked by uid 500); 10 Apr 2012 19:01:54 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 82564 invoked by uid 99); 10 Apr 2012 19:01:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Apr 2012 19:01:54 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.212.170] (HELO mail-wi0-f170.google.com) (209.85.212.170) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Apr 2012 19:01:49 +0000 Received: by wibhr17 with SMTP id hr17so3335300wib.5 for ; Tue, 10 Apr 2012 12:01:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding:x-gm-message-state; bh=0Mhhk7BrfdQucOqLO4+yrFwUXTUMUixuTi7ZIsW57Bk=; b=HLlLM55z+SQV8zQ4IuQiSy5V9GgO6ytR0YLAAbQZmlR8PCzaKCxztk/qYx2eHexxYz Qy9OyVe3FQs1t0RvLrhEBIPms8QeR1ch95xEmlXhQrMx8YGZCCUdbHQ7AkcLJXQuqzpb /uFfE+wq3LW+kdI9tpOcodx16ElzXHxYtD1FlIHvFf81T4sVHHYskl3ioqQZ75TuEzbu 2GcrTdQW0MC0TA0nZZG+LPwPtNd8xT9CrSfRnEljNvuV2DMngUlABT6xYh4nAFTYsihy bPXvqYbDU4XzC3ed8actCrRv0q3OREZWOlfRlkFDmMetdmzUTguyWFN+1V8jkUQizobd VH4A== Received: by 10.216.135.69 with SMTP id t47mr6791772wei.85.1334084487429; Tue, 10 Apr 2012 12:01:27 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.199.95 with HTTP; Tue, 10 Apr 2012 12:01:07 -0700 (PDT) In-Reply-To: <20120410185054.F0CEF2388962@eris.apache.org> References: <20120410185054.F0CEF2388962@eris.apache.org> From: Michael McCandless Date: Tue, 10 Apr 2012 15:01:07 -0400 Message-ID: Subject: Re: svn commit: r1311920 - /lucene/dev/branches/lucene3969/modules/analysis/common/src/test/org/apache/lucene/analysis/core/TestRandomChains.java To: dev@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQnP1c08DamolPEQoEJVqF/lyUuzCRos/GObty29r8T++3Ahd2wOaL4XS9Pt5Mn2mztN/agn X-Virus-Checked: Checked by ClamAV on apache.org Sorry Uwe :) I guess Emacs indents differently from Eclipse! Mike McCandless http://blog.mikemccandless.com On Tue, Apr 10, 2012 at 2:50 PM, wrote: > Author: uschindler > Date: Tue Apr 10 18:50:54 2012 > New Revision: 1311920 > > URL: http://svn.apache.org/viewvc?rev=3D1311920&view=3Drev > Log: > LUCENE-3969: revert Whitespace > > Modified: > =A0 =A0lucene/dev/branches/lucene3969/modules/analysis/common/src/test/or= g/apache/lucene/analysis/core/TestRandomChains.java > > Modified: lucene/dev/branches/lucene3969/modules/analysis/common/src/test= /org/apache/lucene/analysis/core/TestRandomChains.java > URL: http://svn.apache.org/viewvc/lucene/dev/branches/lucene3969/modules/= analysis/common/src/test/org/apache/lucene/analysis/core/TestRandomChains.j= ava?rev=3D1311920&r1=3D1311919&r2=3D1311920&view=3Ddiff > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D > --- lucene/dev/branches/lucene3969/modules/analysis/common/src/test/org/a= pache/lucene/analysis/core/TestRandomChains.java (original) > +++ lucene/dev/branches/lucene3969/modules/analysis/common/src/test/org/a= pache/lucene/analysis/core/TestRandomChains.java Tue Apr 10 18:50:54 2012 > @@ -105,30 +105,30 @@ public class TestRandomChains extends Ba > =A0 =A0 // nocommit can we promote some of these to be only > =A0 =A0 // offsets offenders? > =A0 =A0 Collections.>addAll(brokenComponents, > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 // TODO= : fix basetokenstreamtestcase not to trip because this one has no CharTermA= tt > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 EmptyTo= kenizer.class, > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 // does= n't actual reset itself! > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Caching= TokenFilter.class, > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 // does= n't consume whole stream! > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 LimitTo= kenCountFilter.class, > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 // Not = broken: we forcefully add this, so we shouldn't > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 // also= randomly pick it: > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Validat= ingTokenFilter.class, > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 // NOTE= : these by themselves won't cause any 'basic assertions' to fail. > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 // but = see https://issues.apache.org/jira/browse/LUCENE-3920, if any > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 // toke= nfilter that combines words (e.g. shingles) comes after them, > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 // this= will create bogus offsets because their 'offsets go backwards', > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 // caus= ing shingle or whatever to make a single token with a > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 // star= tOffset thats > its endOffset > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 // (see= LUCENE-3738 for a list of other offenders here) > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 // brok= en! > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 NGramTo= kenizer.class, > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 // brok= en! > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 NGramTo= kenFilter.class, > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 // brok= en! > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 EdgeNGr= amTokenizer.class, > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 // brok= en! > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 EdgeNGr= amTokenFilter.class > + =A0 =A0 =A0// TODO: fix basetokenstreamtestcase not to trip because thi= s one has no CharTermAtt > + =A0 =A0 =A0EmptyTokenizer.class, > + =A0 =A0 =A0// doesn't actual reset itself! > + =A0 =A0 =A0CachingTokenFilter.class, > + =A0 =A0 =A0// doesn't consume whole stream! > + =A0 =A0 =A0LimitTokenCountFilter.class, > + =A0 =A0 =A0// Not broken: we forcefully add this, so we shouldn't > + =A0 =A0 =A0// also randomly pick it: > + =A0 =A0 =A0ValidatingTokenFilter.class, > + =A0 =A0 =A0// NOTE: these by themselves won't cause any 'basic assertio= ns' to fail. > + =A0 =A0 =A0// but see https://issues.apache.org/jira/browse/LUCENE-3920= , if any > + =A0 =A0 =A0// tokenfilter that combines words (e.g. shingles) comes aft= er them, > + =A0 =A0 =A0// this will create bogus offsets because their 'offsets go = backwards', > + =A0 =A0 =A0// causing shingle or whatever to make a single token with a > + =A0 =A0 =A0// startOffset thats > its endOffset > + =A0 =A0 =A0// (see LUCENE-3738 for a list of other offenders here) > + =A0 =A0 =A0// broken! > + =A0 =A0 =A0NGramTokenizer.class, > + =A0 =A0 =A0// broken! > + =A0 =A0 =A0NGramTokenFilter.class, > + =A0 =A0 =A0// broken! > + =A0 =A0 =A0EdgeNGramTokenizer.class, > + =A0 =A0 =A0// broken! > + =A0 =A0 =A0EdgeNGramTokenFilter.class > =A0 =A0 ); > =A0 } > > @@ -137,18 +137,19 @@ public class TestRandomChains extends Ba > =A0 private static final Set> brokenOffsetsComponents =3D Collec= tions.newSetFromMap(new IdentityHashMap,Boolean>()); > =A0 static { > =A0 =A0 Collections.>addAll(brokenOffsetsComponents, > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 WordDel= imiterFilter.class, > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 TrimFil= ter.class, > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Reverse= PathHierarchyTokenizer.class, > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 PathHie= rarchyTokenizer.class, > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Hyphena= tionCompoundWordTokenFilter.class, > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Diction= aryCompoundWordTokenFilter.class, > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 // noco= mmit: corrumpts graphs (offset consistency check): > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Positio= nFilter.class, > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 // noco= mmit it seems to mess up offsets!? > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Wikiped= iaTokenizer.class > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ); > + =A0 =A0 =A0WordDelimiterFilter.class, > + =A0 =A0 =A0TrimFilter.class, > + =A0 =A0 =A0ReversePathHierarchyTokenizer.class, > + =A0 =A0 =A0PathHierarchyTokenizer.class, > + =A0 =A0 =A0HyphenationCompoundWordTokenFilter.class, > + =A0 =A0 =A0DictionaryCompoundWordTokenFilter.class, > + =A0 =A0 =A0// nocommit: corrumpts graphs (offset consistency check): > + =A0 =A0 =A0PositionFilter.class, > + =A0 =A0 =A0// nocommit it seems to mess up offsets!? > + =A0 =A0 =A0WikipediaTokenizer.class > + =A0 =A0); > =A0 } > + > =A0 @BeforeClass > =A0 public static void beforeClass() throws Exception { > =A0 =A0 List> analysisClasses =3D new ArrayList>(); > @@ -168,6 +169,7 @@ public class TestRandomChains extends Ba > =A0 =A0 =A0 ) { > =A0 =A0 =A0 =A0 continue; > =A0 =A0 =A0 } > + > =A0 =A0 =A0 for (final Constructor ctor : c.getConstructors()) { > =A0 =A0 =A0 =A0 // don't test synthetic or deprecated ctors, they likely = have known bugs: > =A0 =A0 =A0 =A0 if (ctor.isSynthetic() || ctor.isAnnotationPresent(Deprec= ated.class)) { > @@ -175,21 +177,22 @@ public class TestRandomChains extends Ba > =A0 =A0 =A0 =A0 } > =A0 =A0 =A0 =A0 if (Tokenizer.class.isAssignableFrom(c)) { > =A0 =A0 =A0 =A0 =A0 assertTrue(ctor.toGenericString() + " has unsupported= parameter types", > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 allowedTokenizerArgs.containsAl= l(Arrays.asList(ctor.getParameterTypes()))); > + =A0 =A0 =A0 =A0 =A0 =A0allowedTokenizerArgs.containsAll(Arrays.asList(c= tor.getParameterTypes()))); > =A0 =A0 =A0 =A0 =A0 tokenizers.add(castConstructor(Tokenizer.class, ctor)= ); > =A0 =A0 =A0 =A0 } else if (TokenFilter.class.isAssignableFrom(c)) { > =A0 =A0 =A0 =A0 =A0 assertTrue(ctor.toGenericString() + " has unsupported= parameter types", > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 allowedTokenFilterArgs.contains= All(Arrays.asList(ctor.getParameterTypes()))); > + =A0 =A0 =A0 =A0 =A0 =A0allowedTokenFilterArgs.containsAll(Arrays.asList= (ctor.getParameterTypes()))); > =A0 =A0 =A0 =A0 =A0 tokenfilters.add(castConstructor(TokenFilter.class, c= tor)); > =A0 =A0 =A0 =A0 } else if (CharStream.class.isAssignableFrom(c)) { > =A0 =A0 =A0 =A0 =A0 assertTrue(ctor.toGenericString() + " has unsupported= parameter types", > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 allowedCharFilterArgs.containsA= ll(Arrays.asList(ctor.getParameterTypes()))); > + =A0 =A0 =A0 =A0 =A0 =A0allowedCharFilterArgs.containsAll(Arrays.asList(= ctor.getParameterTypes()))); > =A0 =A0 =A0 =A0 =A0 charfilters.add(castConstructor(CharStream.class, cto= r)); > =A0 =A0 =A0 =A0 } else { > =A0 =A0 =A0 =A0 =A0 fail("Cannot get here"); > =A0 =A0 =A0 =A0 } > =A0 =A0 =A0 } > =A0 =A0 } > + > =A0 =A0 final Comparator> ctorComp =3D new Comparator>() { > =A0 =A0 =A0 @Override > =A0 =A0 =A0 public int compare(Constructor arg0, Constructor arg1) = { > @@ -205,12 +208,14 @@ public class TestRandomChains extends Ba > =A0 =A0 =A0 System.out.println("charfilters =3D " + charfilters); > =A0 =A0 } > =A0 } > + > =A0 @AfterClass > =A0 public static void afterClass() throws Exception { > =A0 =A0 tokenizers =3D null; > =A0 =A0 tokenfilters =3D null; > =A0 =A0 charfilters =3D null; > =A0 } > + > =A0 /** Hack to work around the stupidness of Oracle's strict Java backwa= rds compatibility. > =A0 =A0* {@code Class#getConstructors()} should return unmodifiable {@= code List>} not array! */ > =A0 @SuppressWarnings("unchecked") > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org