lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <yo...@lucidimagination.com>
Subject Re: How to search for "C++"?
Date Thu, 26 Mar 2009 16:18:43 GMT
Synonym mappings are an easy way to handle specific cases like these...
C++ => cplusplus
C# => csharp

-Yonik
http://www.lucidimagination.com


On Thu, Mar 26, 2009 at 9:27 AM, Jana, Kumar Raja <kjana@ptc.com> wrote:
> Hi Leonardo,
> 1. U can change the fieldtype to "string" in which case no tokenizers
> will act on ur data and the content will be stored as is.
> 2. If u are using Solr 1.4 (latest) then there is a provision to mention
> protected words for WordDelimiterFilterFactory which will take care of
> your issue.
>
> -Kumar
>
> -----Original Message-----
> From: Leonardo Dias [mailto:leonardo@catho.com.br]
> Sent: Thursday, March 26, 2009 6:53 PM
> To: solr-user@lucene.apache.org
> Subject: How to search for "C++"?
>
> Hello there!
>
> Currently we're having a problem in here and we're looking for some
> solutions. Right now we use the Standard Tokenizer to separate tokens
> and we just found out that we cannot search for "c++" in our index
> because it is not considered a word.
>
> Since we need this search to work properly (including a search for C#)
> we'd like to know what are you guys doing when people search for words
> that have symbols, like these programming languages. I thought there
> could be a list of "protected words" in the standard tokenizer, so that
> we could protect these tokens. Another possibility would be using the
> Pattern Tokenizer, but it seems it is kinda slow when it comes to index
> a huge amount of data, which is our case.
>
> What do you think the best solution would be?
>
> Best,
>
> Leonardo
>
> --
>
>
>

Mime
View raw message