Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 73475 invoked from network); 15 Dec 2009 12:34:27 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 15 Dec 2009 12:34:27 -0000 Received: (qmail 20459 invoked by uid 500); 15 Dec 2009 12:34:25 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 20382 invoked by uid 500); 15 Dec 2009 12:34:25 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 20372 invoked by uid 99); 15 Dec 2009 12:34:25 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Dec 2009 12:34:25 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ww.wang.cs@gmail.com designates 209.85.217.225 as permitted sender) Received: from [209.85.217.225] (HELO mail-gx0-f225.google.com) (209.85.217.225) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Dec 2009 12:34:15 +0000 Received: by gxk25 with SMTP id 25so4260954gxk.5 for ; Tue, 15 Dec 2009 04:33:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=5FgCxqTqPk6ztU4x7Hth1WYtTM6abF7wMK1VO8QvcaM=; b=f2Spzzg8cZXQj/y1LROxaSzj+Rpof1D0c3coFTPz/lh0AX++YtIqW4DRY3phY8vJkn caefhCJqyulsHzmriXVInaEz5PjN1mHILJwohwdQzX7cLlVt8f5C831suxJxyXH0qHaj xka0cY6ilVKn106FOLgCgYFo+LScOTWsVk7WA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=H9fKFe4HmoEnvU2VLLlbUPGfE4PvGKIIdOd6u5Y85Yby1imJ0XFWba5lHTtZg4XE3A 4dZ6iqmDArs6hpXFAkOXDEbS42ixgPP42DjLhpFYEfz5jyI+TKqXjIW6o7Sj6ZMtLX48 m9yg8RFLl8wH/5NH2roYuk6tZkMvwW6ki8oH8= MIME-Version: 1.0 Received: by 10.90.18.32 with SMTP id 32mr2756397agr.21.1260880434291; Tue, 15 Dec 2009 04:33:54 -0800 (PST) In-Reply-To: <002f01ca7d7a$6ca671d0$710bc30a@sv.us.sonicwall.com> References: <26748041.post@talk.nabble.com> <4B22DC73.9030502@gmail.com> <002f01ca7d7a$6ca671d0$710bc30a@sv.us.sonicwall.com> Date: Tue, 15 Dec 2009 20:33:54 +0800 Message-ID: <7d94dcde0912150433q3a1d3914iaee98d13ac2230a5@mail.gmail.com> Subject: Re: Lucene Analyzer that can handle C++ vs C# From: Weiwei Wang To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=0016362839ec63f9a0047ac39a8d X-Virus-Checked: Checked by ClamAV on apache.org --0016362839ec63f9a0047ac39a8d Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable KeywordAnalyzer can not handle a whole complete sentence. On Tue, Dec 15, 2009 at 7:33 PM, Ganesh wrote: > How about KeywordAnalyzer? It will treat C++ and C# as single term. > > Regards > Ganesh > > ----- Original Message ----- > From: "Chris Lu" > To: > Sent: Saturday, December 12, 2009 5:27 AM > Subject: Re: Lucene Analyzer that can handle C++ vs C# > > > > What we did in DBSight is to provide a reserved list of words for every > > Lucene Analyzer. > > This way you can handle any special characters like C++ and C#. > > > > Any common analyzers usually are not suitable for these special words. > > > > -- > > Chris Lu > > ------------------------- > > Instant Scalable Full-Text Search On Any Database/Application > > site: http://www.dbsight.net > > demo: http://search.dbsight.com > > Lucene Database Search in 3 minutes: > http://wiki.dbsight.com/index.php?title=3DCreate_Lucene_Database_Search_i= n_3_minutes > > DBSight customer, a shopping comparison site, (anonymous per request) g= ot > 2.6 Million Euro funding! > > > > > > On 12/11/2009 9:09 AM, maxSchlein wrote: > >> Can someone please point me in the right direction. > >> > >> We are creating an application that needs to beable to search on C++ a= nd > get > >> back doc's that have C++ in it. The StandardAnalyzer does not seem to > index > >> the "+", so a search for "C++" will bring back docs that contain, C++, > C, > >> C#, etc..... The WhiteSpaceAnalyzer will index the "+", but if we hav= e > the > >> term "C++." that is, if C++ is at the end of a sentence, it will index > >> "C++." so a search for "C++" will not return the doc. I have heard of > maybe > >> a CustomAnalyzer; however, it seems like there would actually need to = be > a > >> CustomFilter/CustomTokenizer, I looked at: > >> - StandardAnalyzer.java > >> - StandardFilter.java > >> - StandardTokenizer.java > >> - StandardTokenizerImpl.java > >> - StandardTokenizerImpl.jflex > >> > >> I would guess that the StandardTokenizer is where the changes would ne= ed > to > >> be made to allow the "+" character, but I am unclear as to how. > >> > >> Any and all help is greatly appreciated. > >> > >> Going thru all the documents, stripping out "+" for the word "plus" is > not > >> really an option for us. > >> > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > Send instant messages to your online friends http://in.messenger.yahoo.co= m > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --=20 Weiwei Wang Alex Wang =E7=8E=8B=E5=B7=8D=E5=B7=8D Room 403, Mengmin Wei Building Computer Science Department Gulou Campus of Nanjing University Nanjing, P.R.China, 210093 Homepage: http://cs.nju.edu.cn/rl/weiweiwang --0016362839ec63f9a0047ac39a8d--