lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Weiwei Wang <ww.wang...@gmail.com>
Subject Re: Lucene Analyzer that can handle C++ vs C#
Date Tue, 15 Dec 2009 12:33:54 GMT
KeywordAnalyzer can not handle a whole complete sentence.

On Tue, Dec 15, 2009 at 7:33 PM, Ganesh <emailgane@yahoo.co.in> wrote:

> How about KeywordAnalyzer? It will treat C++ and C# as single term.
>
> Regards
> Ganesh
>
> ----- Original Message -----
> From: "Chris Lu" <chris.lu@gmail.com>
> To: <java-user@lucene.apache.org>
> Sent: Saturday, December 12, 2009 5:27 AM
> Subject: Re: Lucene Analyzer that can handle C++ vs C#
>
>
> > What we did in DBSight is to provide a reserved list of words for every
> > Lucene Analyzer.
> > This way you can handle any special characters like C++ and C#.
> >
> > Any common analyzers usually are not suitable for these special words.
> >
> > --
> > Chris Lu
> > -------------------------
> > Instant Scalable Full-Text Search On Any Database/Application
> > site: http://www.dbsight.net
> > demo: http://search.dbsight.com
> > Lucene Database Search in 3 minutes:
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
> > DBSight customer, a shopping comparison site, (anonymous per request) got
> 2.6 Million Euro funding!
> >
> >
> > On 12/11/2009 9:09 AM, maxSchlein wrote:
> >> Can someone please point me in the right direction.
> >>
> >> We are creating an application that needs to beable to search on C++ and
> get
> >> back doc's that have C++ in it.  The StandardAnalyzer does not seem to
> index
> >> the "+", so a search for "C++" will bring back docs that contain, C++,
> C,
> >> C#, etc.....  The WhiteSpaceAnalyzer will index the "+", but if we have
> the
> >> term "C++." that is, if C++ is at the end of a sentence, it will index
> >> "C++." so a search for "C++" will not return the doc.  I have heard of
> maybe
> >> a CustomAnalyzer; however, it seems like there would actually need to be
> a
> >> CustomFilter/CustomTokenizer, I looked at:
> >>       - StandardAnalyzer.java
> >>       - StandardFilter.java
> >>       - StandardTokenizer.java
> >>       - StandardTokenizerImpl.java
> >>       - StandardTokenizerImpl.jflex
> >>
> >> I would guess that the StandardTokenizer is where the changes would need
> to
> >> be made to allow the "+" character, but I am unclear as to how.
> >>
> >> Any and all help is greatly appreciated.
> >>
> >> Going thru all the documents, stripping out "+" for the word "plus" is
> not
> >> really an option for us.
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> Send instant messages to your online friends http://in.messenger.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Weiwei Wang
Alex Wang
王巍巍
Room 403, Mengmin Wei Building
Computer Science Department
Gulou Campus of Nanjing University
Nanjing, P.R.China, 210093

Homepage: http://cs.nju.edu.cn/rl/weiweiwang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message