lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] Updated: (SOLR-1706) wrong tokens output from WordDelimiterFilter depending upon options
Date Wed, 02 Jun 2010 18:37:45 GMT

     [ https://issues.apache.org/jira/browse/SOLR-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated SOLR-1706:
------------------------------

    Fix Version/s: 1.4.1

> wrong tokens output from WordDelimiterFilter depending upon options
> -------------------------------------------------------------------
>
>                 Key: SOLR-1706
>                 URL: https://issues.apache.org/jira/browse/SOLR-1706
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 1.4
>            Reporter: Robert Muir
>            Assignee: Mark Miller
>             Fix For: 1.4.1, 3.1, 4.0
>
>
> below you can see that when I have requested to only output numeric concatenations (not
words), some words are still sometimes output, ignoring the options i have provided, and even
then, in a very inconsistent way.
> {code}
>   assertWdf("Super-Duper-XL500-42-AutoCoder's", 0,0,0,1,0,0,0,0,1, null,
>     new String[] { "42", "AutoCoder" },
>     new int[] { 18, 21 },
>     new int[] { 20, 30 },
>     new int[] { 1, 1 });
>   assertWdf("Super-Duper-XL500-42-AutoCoder's-56", 0,0,0,1,0,0,0,0,1, null,
>     new String[] { "42", "AutoCoder", "56" },
>     new int[] { 18, 21, 33 },
>     new int[] { 20, 30, 35 },
>     new int[] { 1, 1, 1 });
>   assertWdf("Super-Duper-XL500-AB-AutoCoder's", 0,0,0,1,0,0,0,0,1, null,
>     new String[] {  },
>     new int[] {  },
>     new int[] {  },
>     new int[] {  });
>   assertWdf("Super-Duper-XL500-42-AutoCoder's-BC", 0,0,0,1,0,0,0,0,1, null,
>     new String[] { "42" },
>     new int[] { 18 },
>     new int[] { 20 },
>     new int[] { 1 });
> {code}
> where assertWdf is 
> {code}
>   void assertWdf(String text, int generateWordParts, int generateNumberParts,
>       int catenateWords, int catenateNumbers, int catenateAll,
>       int splitOnCaseChange, int preserveOriginal, int splitOnNumerics,
>       int stemEnglishPossessive, CharArraySet protWords, String expected[],
>       int startOffsets[], int endOffsets[], String types[], int posIncs[])
>       throws IOException {
>     TokenStream ts = new WhitespaceTokenizer(new StringReader(text));
>     WordDelimiterFilter wdf = new WordDelimiterFilter(ts, generateWordParts,
>         generateNumberParts, catenateWords, catenateNumbers, catenateAll,
>         splitOnCaseChange, preserveOriginal, splitOnNumerics,
>         stemEnglishPossessive, protWords);
>     assertTokenStreamContents(wdf, expected, startOffsets, endOffsets, types,
>         posIncs);
>   }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message