lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Milind <mili...@gmail.com>
Subject Re: Incorrect tokenizing in the UAX29URLEmailAnalyzer analyzer?
Date Thu, 24 Jul 2014 14:54:42 GMT
Thanks again Steve.  It was the version number.  I hadn't noticed the
deprecated warning.  Changing to use  Version.LUCENE_47 fixed the problem.


On Wed, Jul 23, 2014 at 8:20 PM, Steve Rowe <sarowe@gmail.com> wrote:

> On Jul 23, 2014, at 7:43 PM, Milind <milindr@gmail.com> wrote:
>
> >>>   input=esl2.gbr
> >>>   output=[esl2.gb][r]
> >>>
> >>> This is a bug, which was fixed in Lucene 4.7 - see <
> > https://issues.apache.org/jira/browse/LUCENE-5391>
> >
> > BTW, I changed the POM dependency to 4.7.1, but I'm still seeing the same
> > output.  I can't go beyond 4.7 since it seems 4.8 onwards, Lucene is
> being
> > compiled against Java 7 and I'm still on Java 6.  Hopefully, this will be
> > a non-issue with PerFieldAnalyzerWrapper.  But I just wanted to point
> that
> > out.
>
> I checked out the source code for the 4.7.1 release and added a test for
> “esl2.gbr” to TestUAX29URLEmailAnalyzer.testNoSchemeURLs() <
> http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_4_7_1/lucene/analysis/common/src/test/org/apache/lucene/analysis/core/TestUAX29URLEmailAnalyzer.java?view=markup#l262
> >:
>
>     BaseTokenStreamTestCase.assertAnalyzesTo
>         (a, "esl2.gbr", new String[] { "esl2",     "gbr" },
>             new String[] { "<ALPHANUM>", "<ALPHANUM>" });
>
> This passes: the string is broken up into “esl2” and “gbr” tokens, both
> with type <ALPHANUM>.
>
> Are you sure that you’re running against the 4.7.1 version for all Lucene
> dependencies (including lucene-analyzers-common)?
>
> Also, you need to change the value of the matchVersion parameter to the
> constructor to match the version you’re using; unless you do this, the
> behavior will remain the same as that of the version referred to by the
> matchVersion parameter.
>
> Steve
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Regards
Milind

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message