lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeevanandam <je...@myjeeva.com>
Subject Re: How to escape “<” character in regex in Solr schema.xml?
Date Thu, 19 Apr 2012 13:33:03 GMT
previously given pattern will solve the '<' char issue. however you 
will get following exception in the log

Caused by: java.util.regex.PatternSyntaxException: Look-behind group 
does not have an obvious maximum length near index 48
(?<=[^.!?\\s][^.!?]*(?:[.!?](?![']?\s|$)[^.!?]*)*)[.!?]+(?=\\s|$)
                                                 ^
so revisit your regex pattern particularly position 48

-Jeevanandam


On 19-04-2012 7:06 pm, Jeevanandam wrote:
> try this one
>
> 
> pattern="(?&lt;=[^.!?\\s][^.!?]*(?:[.!?](?![']?\s|$)[^.!?]*)*)[.!?]+(?=\\s|$)"
>
> I tested locally, solr start perfectly. now please test with data.
>
> -Jeevanandam
>
>
> On 19-04-2012 9:29 am, smooth almonds wrote:
>> Using Solr 3.5.0 and in my schema.xml I'm using the following to 
>> mark the end
>> of sentences and replace the end punctuation with a symbolic token:
>>
>> <charFilter class=&quot;solr.PatternReplaceCharFilterFactory&quot;
>> 
>> pattern=&quot;(?&lt;=[^.!?\\s][^.!?]*(?:[.!?](?![']?\s|$)[^.!?]*)*)[.!?]+(?=\\s|$)&quot;
>> replacement=&quot; monkeysentence&quot;/>
>>
>> I'm not sure if that will even work for what I want, but first I 
>> need to
>> solve the problem of escaping the '<' character in the first '?<='
>> lookbehind.
>>
>> I get the following error:
>>
>> org.xml.sax.SAXParseException: The value of attribute "pattern" 
>> associated
>> with an element type "null" must not contain the '<' character.
>>
>> I've tried using a '\' as in:
>>
>> 
>> pattern="(?\<=[^.!?\\s][^.!?]*(?:[.!?](?![']?\s|$)[^.!?]*)*)[.!?]+(?=\\s|$)"
>>
>> But I get the same error.
>>
>> --
>> View this message in context:
>> 
>> http://lucene.472066.n3.nabble.com/How-to-escape-character-in-regex-in-Solr-schema-xml-tp3921961p3921961.html
>> Sent from the Solr - User mailing list archive at Nabble.com.


Mime
View raw message