lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Lu <>
Subject Re: Lucene Analyzer that can handle C++ vs C#
Date Fri, 11 Dec 2009 23:57:39 GMT
What we did in DBSight is to provide a reserved list of words for every 
Lucene Analyzer.
This way you can handle any special characters like C++ and C#.

Any common analyzers usually are not suitable for these special words.

Chris Lu
Instant Scalable Full-Text Search On Any Database/Application
Lucene Database Search in 3 minutes:
DBSight customer, a shopping comparison site, (anonymous per request) got 2.6 Million Euro

On 12/11/2009 9:09 AM, maxSchlein wrote:
> Can someone please point me in the right direction.
> We are creating an application that needs to beable to search on C++ and get
> back doc's that have C++ in it.  The StandardAnalyzer does not seem to index
> the "+", so a search for "C++" will bring back docs that contain, C++, C,
> C#, etc.....  The WhiteSpaceAnalyzer will index the "+", but if we have the
> term "C++." that is, if C++ is at the end of a sentence, it will index
> "C++." so a search for "C++" will not return the doc.  I have heard of maybe
> a CustomAnalyzer; however, it seems like there would actually need to be a
> CustomFilter/CustomTokenizer, I looked at:
>       -
>       -
>       -
>       -
>       - StandardTokenizerImpl.jflex
> I would guess that the StandardTokenizer is where the changes would need to
> be made to allow the "+" character, but I am unclear as to how.
> Any and all help is greatly appreciated.
> Going thru all the documents, stripping out "+" for the word "plus" is not
> really an option for us.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message