lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-6914) DecimalDigitFilter skips characters in some cases (supplemental?)
Date Wed, 02 Dec 2015 17:14:10 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-6914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036156#comment-15036156
] 

Hoss Man commented on LUCENE-6914:
----------------------------------

i wrote it thinking it would be helpful to first demonstrate what worked (x, or space, or
some non candidate char in between digits as a "gap"), but then when you replace those "gaps"
when empty space it breaks.  now that we know the problem and can trivially reproduce with
randomized tests, i'm not sure it's really relevant -- but you could always just clone those
asserts and do a bunch of different varieties for the "gap" (single space, x, some wide supplemental
character that isn't a digit, etc...)

> DecimalDigitFilter skips characters in some cases (supplemental?)
> -----------------------------------------------------------------
>
>                 Key: LUCENE-6914
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6914
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 5.4
>            Reporter: Hoss Man
>         Attachments: LUCENE-6914.patch, LUCENE-6914.patch, LUCENE-6914.patch
>
>
> Found this while writing up the solr ref guide for DecimalDigitFilter. 
> With input like "𝟙𝟡𝟠𝟜" ("Double Struck" 1984) the filter produces "1𝟡8𝟜"
(1, double struck 9, 8, double struck 4)  add some non-decimal characters in between the digits
(ie: "𝟙x𝟡x𝟠x𝟜") and you get the expected output ("1x9x8x4").  This doesn't affect
all decimal characters though, as evident by the existing test cases.
> Perhaps this is an off by one bug in the "if the original was supplementary, shrink the
string" code path?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message