lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: Searching on special characters
Date Thu, 24 Oct 2013 14:19:36 GMT
I'm not sure what you mean.  Based on what you are saying, is there an example of how I can
setup my schema.xml to get the result I need?

Also, the way I execute a search is using http://localhost:8080/solr/select/?q=<search-term>
 Does your solution require me to change this?  If so, in what way?

It would be great if all this is documented somewhere, so I won't have to bug you guys !!!


-----Original Message-----
From: Jack Krupansky <>
To: solr-user <>
Sent: Thu, Oct 24, 2013 9:39 am
Subject: Re: Searching on special characters

Have two or three copies of the text, one field could be raw string and 
boosted heavily for exact match, a second could be text using the keyword 
tokenizer but with lowercase filter also heavily boosted, and the third 
field general, tokenized text with a lower boost. You could also have a copy 
that uses the keyword tokenizer to maintain a single token but also applies 
a regex filter to strip special characters and applies a lower case filter 
and give that an intermediate boost.

-- Jack Krupansky

-----Original Message----- 
Sent: Thursday, October 24, 2013 9:20 AM
Subject: Searching on special characters


How should I setup Solr so I can search and get hit on special characters 
such as: + - && || ! ( ) { } [ ] ^ " ~ * ? : \

My need is, if a user has text like so:

Doc-#1: "(Solr)"
Doc-#2: "Solr"

And they type "(solr)" I want a hit on "(solr)" only in document #1, with 
the brackets matching.  And if they type "solr", they will get a hit in 
Document #2 only.

An additional nice-to-have is, if they type "solr", I want a hit in both 
document #1 and #2.

Here is what my current schema.xml looks like:

        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="lang/stopwords_en.txt" enablePositionIncrements="true"/>
        <filter class="solr.WordDelimiterFilterFactory" 
generateWordParts="1" generateNumberParts="1" catenateWords="1" 
catenateNumbers="1" catenateAll="1" splitOnCaseChange="0" 
splitOnNumerics="1" stemEnglishPossessive="1" preserveOriginal="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" 
        <filter class="solr.PorterStemFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>

Currently, special characters are being stripped.

Any idea how I can configure Solr to do this?  I'm using Solr 3.6.

Thanks !!



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message