lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <>
Subject [jira] [Commented] (LUCENE-4078) PatternReplaceCharFilter assertion error
Date Sun, 27 May 2012 06:54:23 GMT


Hoss Man commented on LUCENE-4078:

bq. It doesn't match an empty string. It matches an empty string in between characters...

Well, it's more complicated then that.  it *does* match the empty string (in the sense of
"does this regex match this entire string which happens to be empty) but in the context of
"find" or "replace" on a larger string you are correct that it matches nothing, which means
it matches the emptiness between characters.

bq. I'd be convinced '|' is a consistent way of saying 'match empty string or empty string'
if "+" pattern worked ("match empty string one or more times"), but it doesn't -- this will
fail with an error. So '|' is kind of special here.

I think that's just a fluke of syntax/precedence ... if you use parens (capturing or otherwise)
you can say "match the empty pattern 1 or more times)...

$ perl -MData::Dumper -le 'print Dumper split /(?:)+/, "ABCD";'
$VAR1 = 'A';
$VAR2 = 'B';
$VAR3 = 'C';
$VAR4 = 'D';

Bottom Line: these patterns are all valid and meaningful, and everything we've discussed is
tangential to the problem -- which seems to be that the JVM lets the empty pattern split in
between chars instead of codepoints, which seems like a bug.
> PatternReplaceCharFilter assertion error
> ----------------------------------------
>                 Key: LUCENE-4078
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>             Fix For: 4.0
> Build:
> 1 tests failed.
> REGRESSION:  org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings
> Error Message:
> Stack Trace:
> java.lang.AssertionError
>        at __randomizedtesting.SeedInfo.seed([8E91A6AC395FEED9:618A6129A5BB9EC]:0)
>        at org.apache.lucene.analysis.MockTokenizer.readCodePoint(
>        at org.apache.lucene.analysis.MockTokenizer.incrementToken(
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(
>        at org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings(
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(
>        at java.lang.reflect.Method.invoke(
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(
>        at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(
>        at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(
>        at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(
>        at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(
>        at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(
>        at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(
>        at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(Randomized

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message