lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From maxSchlein <m_schl...@hotmail.com>
Subject Lucene Analyzer that can handle C++ vs C#
Date Fri, 11 Dec 2009 17:09:48 GMT

Can someone please point me in the right direction.

We are creating an application that needs to beable to search on C++ and get
back doc's that have C++ in it.  The StandardAnalyzer does not seem to index
the "+", so a search for "C++" will bring back docs that contain, C++, C,
C#, etc.....  The WhiteSpaceAnalyzer will index the "+", but if we have the
term "C++." that is, if C++ is at the end of a sentence, it will index
"C++." so a search for "C++" will not return the doc.  I have heard of maybe
a CustomAnalyzer; however, it seems like there would actually need to be a
CustomFilter/CustomTokenizer, I looked at:
     - StandardAnalyzer.java
     - StandardFilter.java
     - StandardTokenizer.java
     - StandardTokenizerImpl.java
     - StandardTokenizerImpl.jflex

I would guess that the StandardTokenizer is where the changes would need to
be made to allow the "+" character, but I am unclear as to how.

Any and all help is greatly appreciated.

Going thru all the documents, stripping out "+" for the word "plus" is not
really an option for us. 
-- 
View this message in context: http://old.nabble.com/Lucene-Analyzer-that-can-handle-C%2B%2B-vs-C--tp26748041p26748041.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message