lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul <p...@nines.org>
Subject Searching for escaped characters
Date Thu, 28 Apr 2011 16:10:27 GMT
I'm trying to create a test to make sure that character sequences like
"&egrave;" are successfully converted to their equivalent utf
character (that is, in this case, "รจ").

So, I'd like to search my solr index using the equivalent of the
following regular expression:

&\w{1,6};

To find any escaped sequences that might have slipped through.

Is this possible? I have indexed these fields with text_lu, which
looks like this:

   <fieldtype name="text_lu" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StandardFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldtype>

Thanks,
Paul

Mime
View raw message