lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-2093) regular expression in PatternReplaceFilter can handle: /([^/]*)
Date Wed, 29 Sep 2010 23:46:37 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916309#action_12916309
] 

Hoss Man commented on SOLR-2093:
--------------------------------

Note: Part of your confusion may lie in the meaning behind {{replace="all"}} ... this doesn't
mean replace the entire Token, this means replace all matches of the regex with the replacement
value -- so the pattern will be evaluated over and over against the input string (starting
at the end of the last match) until it no longer matches, and each match will result in a
replacement.

If you want the entire input Token to be replaced by the parenthetical group, you need to
anchor your regex at both ends.  This should work..

{noformat}
<filter class="solr.PatternReplaceFilterFactory"
        pattern="^.*/([^/]*)/[^/]*$" replacement="$1" replace="all" />
{noformat}

> regular expression in PatternReplaceFilter can handle: /([^/]*)
> ---------------------------------------------------------------
>
>                 Key: SOLR-2093
>                 URL: https://issues.apache.org/jira/browse/SOLR-2093
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 1.4
>         Environment: debian,JRE1.6,solr1.4
>            Reporter: Kuri Masta
>            Priority: Minor
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Using PatternReplaceFilter i want to extract a certain word out of the URI.
> Although I now understand that I should handle this outside of Solr, the fact remains
that Solr does not adequately handle regular expressions.
> Viewing the source code, I don't see any problems since it uses the java library.
> The problem:
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.PatternReplaceFilterFactory"
>                         pattern="/([^/]*)/[^/]*$" replacement="$1"  replace="all" />
>       </analyzer>
> Input text:
> - a/b/c
> Expected
> - b
> Result Solr
> - ab
> An online JAVA regexp tester (http://www.regexplanet.com/simple/index.html):
> - b
> So the problem area lies at /([^/])

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message