lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dawid Weiss (JIRA)" <>
Subject [jira] [Commented] (LUCENE-4078) PatternReplaceCharFilter assertion error
Date Sat, 26 May 2012 22:45:22 GMT


Dawid Weiss commented on LUCENE-4078:

bq. I'm not really following you there ... '|' is the OR operator, so the regex "|" is a redundant
way of saying "" which is "the empty pattern" or a way of saying "match the empty string".

Yeah, I am a bit surprised at what "" matches. It doesn't match an empty string. It matches
an empty string in between characters... or in other words, it matches what's not there. Makes
sense when you think of it.

As for '|', I looked at it from automata theory point of view -- '|' doesn't need any arguments
or post-arguments (or states), unlike '+', '*' or the like which need a state to reference.
I'd be convinced '|' is a consistent way of saying 'match empty string or empty string' if
"+" pattern worked ("match empty string one or more times"), but it doesn't -- this will fail
with an error. So '|' is kind of special here.

I don't know much about regexp theory to argue if I'm right or wrong though. I don't even
think there is one "right" way to do things if this is a true quote:

I define UNIX as “30 definitions of regular expressions living under one roof.” —Don

> PatternReplaceCharFilter assertion error
> ----------------------------------------
>                 Key: LUCENE-4078
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>             Fix For: 4.0
> Build:
> 1 tests failed.
> REGRESSION:  org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings
> Error Message:
> Stack Trace:
> java.lang.AssertionError
>        at __randomizedtesting.SeedInfo.seed([8E91A6AC395FEED9:618A6129A5BB9EC]:0)
>        at org.apache.lucene.analysis.MockTokenizer.readCodePoint(
>        at org.apache.lucene.analysis.MockTokenizer.incrementToken(
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(
>        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(
>        at org.apache.lucene.analysis.pattern.TestPatternReplaceCharFilter.testRandomStrings(
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(
>        at java.lang.reflect.Method.invoke(
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(
>        at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(
>        at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(
>        at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(
>        at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(
>        at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(
>        at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(
>        at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(
>        at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(
>        at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(Randomized

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message