lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oliver Schihin <oliver.schi...@unibas.ch>
Subject Re: Is there any special meaning for # symbol in solr.
Date Tue, 04 Sep 2012 08:04:58 GMT
You are not using a string type, but a TextField. And in your analysis chain,
standardtokenizer strips the number sign (or #). You can check this in the "analysis" part
of the solr backend.

You can either use a string type for seaches like C#, C++ and the like, or map the
characters to something textual *before* tokenizing. My solution goes something like this:

<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-chars.txt"/>
 while mapping-chars.txt is:
*****************
# ########
# Specials
# ########

# C+ => Cplus
# C++ => Cplusplus
"\u0043\u002B" => "Cplus"
"\u0063\u002B" => "Cplus"
"\u0043\u002B\u002B" => "Cplusplus"
"\u0063\u002B\u002B" => "Cplusplus"

# C#, C♯ => Csharp
"\u0043\u0023" => "Csharp"
"\u0063\u0023" => "Csharp"
"\u0043\u266f" => "Csharp"
"\u0063\u266f" => "Csharp"

# F#, F♯ => Fsharp
"\u0046\u0023" => "Fsharp"
"\u0066\u0023" => "Fsharp"
"\u0046\u266f" => "Fsharp"
"\u0066\u266f" => "Fsharp"

# J#, J♯ => Jsharp
"\u004A\u0023" => "Jsharp"
"\u006A\u0023" => "Jsharp"
"\u004A\u266f" => "Jsharp"
"\u006A\u266f" => "Jsharp"

# ♭ => b
"\u266d" => "b"

# @ => at
"\u0040" => "at"
*******************************

Then use any tokenizer



-------- Original-Nachricht --------
Betreff: Re: Is there any special meaning for # symbol in solr.
Von: veena rani <veenaranip@gmail.com>
An: solr-user@lucene.apache.org
CC: te <te@statsbiblioteket.dk>
Datum: 04.09.2012 09:49

> this is the field type i m using for techskill,
> 
>  <field name="techskill"   type="text_general"  indexed="true"
>  stored="true" />
> 
> <fieldType name="text_general" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
>         <!-- in this example, we will only use synonyms at query time
>         <filter class="solr.SynonymFilterFactory"
> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>         -->
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>     </fieldType>
> 
> 
> On Tue, Sep 4, 2012 at 1:16 PM, veena rani <veenaranip@gmail.com> wrote:
> 
>> No, # is not a stop word.
>>
>>
>> On Tue, Sep 4, 2012 at 12:59 PM, 李赟 <liyun2010@corp.netease.com> wrote:
>>
>>> Is "#" in your stop words list ?
>>>
>>>
>>> 2012-09-04
>>>
>>>
>>>
>>> Li Yun
>>> Software Engineer @ Netease
>>> Mail: liyun2010@corp.netease.com
>>> MSN: rockiee281@gmail.com
>>>
>>>
>>>
>>>
>>> 发件人: veena rani
>>> 发送时间: 2012-09-04  12:57:26
>>> 收件人: solr-user; te
>>> 抄送:
>>> 主题: Re: Is there any special meaning for # symbol in solr.
>>>
>>> if i use this link ,
>>> http://localhost:8080/solr/select?&q=(techskill%3Ac%23)
>>> , solr is going to display techskill:c result.
>>> But i want to display only techskill:c#  result.
>>> On Mon, Sep 3, 2012 at 7:23 PM, Toke Eskildsen <te@statsbiblioteket.dk
>>>> wrote:
>>>> On Mon, 2012-09-03 at 13:39 +0200, veena rani wrote:
>>>>>>  I have an issue with the # symbol, in solr,
>>>>>>  I m trying to search for string ends up with # , Eg:c#, it is
>>> throwing
>>>>>>  error Like, org.apache.lucene.queryparser.classic.ParseException:
>>>> Cannot
>>>>>>  parse '(techskill:c': Encountered "<EOF>" at line 1, column
12.
>>>> Solr only received '(techskill:c', which has unbalanced parentheses.
>>>> My guess is that you do not perform a URL-encode of '#' and that you
>>>> were sending something like
>>>> http://localhost:8080/solr/select?&q=(techskill:c#)
>>>> when you should have been sending
>>>> http://localhost:8080/solr/select?&q=(techskill%3Ac%23)
>>>>
>>>>
>>> --
>>> Regards,
>>> Veena.
>>> Banglore.
>>>
>>
>>
>> --
>> Regards,
>> Veena.
>> Banglore.
>>
>>
> 
> 


Mime
View raw message