lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Krupansky <jack.krupan...@gmail.com>
Subject Re: Custom field using PatternCaptureGroupFilterFactory
Date Mon, 07 Mar 2016 03:43:11 GMT
The filter name, "Capture Group", says it all - only pattern groups are
captured and you have not specified even a single group. See the example:
http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/pattern/PatternCaptureGroupFilterFactory.html

Groups are each enclosed within parentheses, as shown in the Javadoc
example above.

Since no groups were found, the filter doc applied this rule:
"If none of the patterns match, or if preserveOriginal is true, the
original token will be preserved."
http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/pattern/PatternCaptureGroupTokenFilter.html

That should probably also say "or if no pattern groups match".

To test regular expressions, try an interactive online tool, such as:
https://regex101.com/

-- Jack Krupansky

On Sun, Mar 6, 2016 at 7:51 PM, Alexandre Rafalovitch <arafalov@gmail.com>
wrote:

> I don't see the brackets that mark the group you actually want to
> capture. As per:
>
> http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/pattern/PatternCaptureGroupTokenFilter.html
>
> I am also not sure if you actually need "{0,1}" part.
>
> Regards,
>    Alex.
> ----
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 7 March 2016 at 04:25, Jay Potharaju <jspotharaju@gmail.com> wrote:
> > Hi,
> > I have a custom field for getting the first letter of an firstname. For
> > this I am using PatternCaptureGroupFilterFactory.
> > This is not working as expected, not able to parse the data and get the
> > first character for the string. Any suggestions on how to fix this?
> >
> >  <fieldType class="solr.TextField" name="text_firstLetter">
> >
> >       <analyzer>
> >
> >         <tokenizer class="solr.KeywordTokenizerFactory"/>
> >
> >         <filter class="solr.UpperCaseFilterFactory"/>
> >
> >         <filter class="solr.PatternCaptureGroupFilterFactory" pattern=
> > "^[a-zA-Z0-9]{0,1}" preserve_original="false"/>
> >
> >        </analyzer>
> >
> >     </fieldType>
> >
> > --
> > Thanks
> > Jay
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message