lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Krupansky <jack.krupan...@gmail.com>
Subject Re: Special character and wildcard matching
Date Tue, 24 Feb 2015 01:16:30 GMT
But how is that lowercasing occurring? I mean, solr.StrField doesn't do
that.

Some containers default to automatically mapping accented characters, so
that the accented "e" would then get indexed as a normal "e", and then your
wildcard would match it, and an accented "e" in a query would get mapped as
well and then match the normal "e" in the index. What does your query
response look like?

This blog post explains that problem:
http://bensch.be/tomcat-solr-and-special-characters

Note that you could make your string field a text field with the keyword
tokenizer and then filter it for lower case, such as when the user query
might have a capital "B". String field is most appropriate when the field
really is 100% raw.


-- Jack Krupansky

On Mon, Feb 23, 2015 at 7:37 PM, Arun Rangarajan <arunrangarajan@gmail.com>
wrote:

> Yes, it is a string field and not a text field.
>
> <fieldType name="string" class="solr.StrField" sortMissingLast="true"
> omitNorms="true"/>
> <field name="raw_name" type="string" indexed="true" stored="true" />
>
> Lower-casing done to do case-insensitive matching.
>
> On Mon, Feb 23, 2015 at 4:01 PM, Jack Krupansky <jack.krupansky@gmail.com>
> wrote:
>
> > Is it really a string field - as opposed to a text field? Show us the
> field
> > and field type.
> >
> > Besides, if it really were a "raw" name, wouldn't that be a capital "B"?
> >
> > -- Jack Krupansky
> >
> > On Mon, Feb 23, 2015 at 6:52 PM, Arun Rangarajan <
> arunrangarajan@gmail.com
> > >
> > wrote:
> >
> > > I have a string field raw_name like this in my document:
> > >
> > > {raw_name: beyoncé}
> > >
> > > (Notice that the last character is a special character.)
> > >
> > > When I issue this wildcard query:
> > >
> > > q=raw_name:beyonce*
> > >
> > > i.e. with the last character simply being the ASCII 'e', Solr returns
> me
> > > the above document.
> > >
> > > How do I prevent this?
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message