Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 54234 invoked from network); 24 Jun 2008 16:14:11 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 24 Jun 2008 16:14:11 -0000 Received: (qmail 12917 invoked by uid 500); 24 Jun 2008 16:14:06 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 12885 invoked by uid 500); 24 Jun 2008 16:14:06 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 12874 invoked by uid 99); 24 Jun 2008 16:14:06 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Jun 2008 09:14:06 -0700 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [98.136.44.45] (HELO n77.bullet.mail.sp1.yahoo.com) (98.136.44.45) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 24 Jun 2008 16:13:14 +0000 Received: from [216.252.122.216] by n77.bullet.mail.sp1.yahoo.com with NNFMP; 24 Jun 2008 16:12:48 -0000 Received: from [68.142.230.29] by t1.bullet.sp1.yahoo.com with NNFMP; 24 Jun 2008 16:12:48 -0000 Received: from [69.147.75.192] by t2.bullet.re2.yahoo.com with NNFMP; 24 Jun 2008 16:12:48 -0000 Received: from [127.0.0.1] by omp108.mail.re1.yahoo.com with NNFMP; 24 Jun 2008 16:12:48 -0000 X-Yahoo-Newman-Id: 9605.51449.bm@omp108.mail.re1.yahoo.com Received: (qmail 33001 invoked from network); 24 Jun 2008 16:12:47 -0000 Received: from unknown (HELO ?192.168.1.151?) (nhira@cognocys.com@69.136.240.12 with plain) by smtp116.plus.mail.re1.yahoo.com with SMTP; 24 Jun 2008 16:12:47 -0000 X-YMail-OSG: JXZl24wVM1k3.BcTRFajloy4FaYerjOdkizGUWSm9vPgYsQ64phWeKcpvyCO0S7kHhMz2SVNtvXI6suGe8jwTK6FPyiydsPQBlP6UfXMYjZVijiyQLsefbylVGC6Pq1E9WY- X-Yahoo-Newman-Property: ymail-3 Mime-Version: 1.0 (Apple Message framework v753.1) In-Reply-To: <48611AB5.4040904@propylon.com> References: <48611AB5.4040904@propylon.com> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <40B14E03-0F3D-434A-AE59-3792D2A8EEFB@cognocys.com> Content-Transfer-Encoding: 7bit From: "N. Hira" Subject: Re: searching for C++ Date: Tue, 24 Jun 2008 11:12:46 -0500 To: java-user@lucene.apache.org X-Mailer: Apple Mail (2.753.1) X-Virus-Checked: Checked by ClamAV on apache.org This isn't ideal, but if you have a defined list of such terms, you may find it easier to filter these terms out into a separate field for indexing. -h ---------------------------------------------------------------------- Hira, N.R. Solutions Architect Cognocys, Inc. (773) 251-7453 On 24-Jun-2008, at 11:03 AM, John Byrne wrote: > I don't think there is a simpler way. I think you will have to > modify the tokenizer. Once you go beyond basic human-readable text, > you always end up having to do that. I have modified the JavaCC > version of StandardTokenizer for allowing symbols to pass through, > but I've never used the JFlex version - don't know anything about > JFlex I'm afraid! > > A good strategy might be to make a new type of lexical token called > "SYMBOL" and try to catch as many symbols as you can think of; then > maybe create new token types which are ALPHANUM types that can have > pre-fixed or post-fixed symbols. > > That way, you'll be able to catch things like "c++" in a > TokenFilter, and you can choose to pass it through as a single > token, or split it up into two tokens, or whatever you want. > > Hope that helps. > > Regards, > JB > > Alex Soto wrote: >> Hello: >> >> I have a problem where I need to search for the term "C++". >> If I use StandardAnalyzer, the "+" characters are removed and the >> search is done on just the "c" character which is not what is >> intended. >> Yet, I need to use standard analyzer for the other benefits it >> provides. >> >> I think I need to write a specialized tokenizer (and accompanying >> analyzer) that let the "+" characters pass. >> I would use the JFlex provided one, modify it and add it to my >> project. >> >> My question is: >> >> Is there any simpler way to accomplish the same? >> >> >> Best regards, >> Alex Soto >> lexsoto@gmail.com >> >> - >> Amicus Plato, sed magis amica veritas. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> >> >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org