Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 48740 invoked from network); 16 Jul 2009 17:08:53 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 16 Jul 2009 17:08:53 -0000 Received: (qmail 13853 invoked by uid 500); 16 Jul 2009 17:09:56 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 13794 invoked by uid 500); 16 Jul 2009 17:09:56 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 13784 invoked by uid 99); 16 Jul 2009 17:09:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Jul 2009 17:09:56 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,MISSING_MID,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of prvs=144802357e=chris@mainsequence.net designates 65.199.122.3 as permitted sender) Received: from [65.199.122.3] (HELO mainsequence.net) (65.199.122.3) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Jul 2009 17:09:43 +0000 Received: from 127.0.0.1 ([66.104.120.34]) by mainsequence.net (mainsequence.net) (Cipher TLSv1:RC4-MD5:128) (MDaemon PRO v10.1.0) with ESMTP id md50010417773.msg for ; Thu, 16 Jul 2009 13:09:22 -0400 X-Spam-Processed: mainsequence.net, Thu, 16 Jul 2009 13:09:22 -0400 (not processed: message from trusted or authenticated source) X-MDRemoteIP: 66.104.120.34 X-Return-Path: prvs=144802357e=chris@mainsequence.net X-Envelope-From: chris@mainsequence.net X-MDaemon-Deliver-To: java-user@lucene.apache.org Reply-To: "Chris Salem" From: "Chris Salem" To: java-user@lucene.apache.org CC: Subject: Re: searching for c++, c#, etc... Date: Thu, 16 Jul 2009 17:09:17 -0000 X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: PCR Emu of Microsoft Office Outlook, Build 11.0.6353 X-MimeOLE: PCR Emu of Microsoft MimeOLE V6.00.2800.1441 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_pcr81E98C174745445A816B8D471AA1CE34section1-b1" X-Virus-Checked: Checked by ClamAV on apache.org Message-Id: <20090716170954.43AF081601D@nike.apache.org> ------=_pcr81E98C174745445A816B8D471AA1CE34section1-b1 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit I figured "c++." would be a problem. Here's what I did to get around it: value.toLowerCase().replaceAll("\\.( ?\t?\n?\r?)+", " ") I'm not escaping +'s from the query so I should be good there. thanks alot. Sincerely, Chris Salem Development Team Main Sequence Technologies, Inc. PCRecruiter.net - PCRecruiter Support chris@mainsequence.net P: 440.946.5214 ext 5458 F: 440.856.0312 This email and any files transmitted with it may contain confidential information intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the company. Finally, the recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. Main Sequence Technologies, Inc. 4420 Sherwin Rd. Willoughby OH 44094 www.pcrecruiter.net ----- Original Message ----- To: java-user@lucene.apache.org, Chris Salem From: John Wang Sent: 7/16/2009 12:09:05 PM Subject: Re: searching for c++, c#, etc... If you escape the character + or #, the sentence: "I know java + c++" would not skip +, furthermore, it breaks query parsing, where + is reserved. -John On Thu, Jul 16, 2009 at 9:04 AM, John Wang wrote: > This runs into problems when you have such following sentence: > "I dislike c++." > > If you use WSA, then last token is "c++.", not "c++", the query would not > find this document. > > -John > > > On Thu, Jul 16, 2009 at 8:29 AM, Chris Salem wrote: > >> That seems to be working. you don't have to escape the pluses though. >> Also, it appears that the WhitespaceAnalyzer is case sensitive, but I guess >> I could lowercase everything that gets indexed. >> thanks alot for your help. >> Sincerely, >> Chris Salem >> Development Team >> Main Sequence Technologies, Inc. >> PCRecruiter.net - PCRecruiter Support >> chris@mainsequence.net >> P: 440.946.5214 ext 5458 >> F: 440.856.0312 >> >> This email and any files transmitted with it may contain confidential >> information intended solely for the use of the individual or entity to whom >> they are addressed. If you have received this email in error please notify >> the sender. Please note that any views or opinions presented in this email >> are solely those of the author and do not necessarily represent those of the >> company. Finally, the recipient should check this email and any attachments >> for the presence of viruses. The company accepts no liability for any damage >> caused by any virus transmitted by this email. Main Sequence Technologies, >> Inc. 4420 Sherwin Rd. Willoughby OH 44094 www.pcrecruiter.net >> >> >> >> >> ----- Original Message ----- >> To: java-user@lucene.apache.org, Chris Salem >> From: Danil TORIN >> Sent: 7/16/2009 10:28:37 AM >> Subject: Re: searching for c++, c#, etc... >> >> >> Try WhitespaceAnalyzer for both indexing and searching. >> On search-time you may also need to escape "+", "(", ")" with "\". >> "#" shouldn't need escaping. >> >> On Thu, Jul 16, 2009 at 17:23, Chris Salem wrote: >> > I'm using the StandardAnalyzer for both searching and indexing. >> > Here's the code to parse the query: >> > Searcher searcher = new IndexSearcher(reader); >> > Analyzer analyzer = new StandardAnalyzer(stopwords); >> > System.out.println(queryString); >> > QueryParser qp = new QueryParser(searchField,analyzer); >> > Query query = qp.parse(queryString); >> > queryString = query.toString(); >> > System.out.println(queryString); >> > And here's the output from the println's: >> > r2_resume_text:c\+\+ AND r2_resume_text: c\# >> > +r2_resume_text:c +r2_resume_text:c >> > Also the documentation doesn't say anything about # having to be >> escaped. >> > Do I have to escape during indexing too? >> > Sincerely, >> > Chris Salem >> > >> > >> > >> > ----- Original Message ----- >> > To: java-user@lucene.apache.org, Chris Salem >> > From: Ian Lea >> > Sent: 7/16/2009 5:12:53 AM >> > Subject: Re: searching for c++, c#, etc... >> > >> > >> > Hi >> > >> > >> > Escaping should work. See >> > http://lucene.apache.org/java/2_4_1/queryparsersyntax.html and >> > QueryParser.escape(). And you need to be sure that your analyzer >> > isn't removing the plus signs and that you use the same analyzer for >> > indexing and searching. >> > >> > Googling for something like "lucene escape" will find you more info. >> > >> > Luke will tell you what is actually in your index. >> > >> > >> > -- >> > Ian. >> > >> > >> > On Wed, Jul 15, 2009 at 5:19 PM, Chris Salem >> wrote: >> >> Hello, >> >> I'm trying to search for the terms like c++ but the parser is stripping >> off the ++. I tried escaping the ++ with slashes but it's still stripping >> it off. I could replace + with "plus", is that the best way to do it? How >> come escaping isn't working? >> >> thanks >> >> Sincerely, >> >> Chris Salem >> >> >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> > For additional commands, e-mail: java-user-help@lucene.apache.org >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> (The following links were included with this email:) >> http://www.pcrecruiter.net/ >> >> http://www.pcrecruiter.net/support.htm >> >> mailto:chris@mainsequence.net >> >> >> >> (The following links were included with this email:) >> http://www.pcrecruiter.net/ >> >> http://www.pcrecruiter.net/support.htm >> >> mailto:chris@mainsequence.net >> >> >> > (The following links were included with this email:) http://www.pcrecruiter.net/ http://www.pcrecruiter.net/support.htm mailto:chris@mainsequence.net (The following links were included with this email:) http://www.pcrecruiter.net/ http://www.pcrecruiter.net/support.htm mailto:chris@mainsequence.net ------=_pcr81E98C174745445A816B8D471AA1CE34section1-b1--