lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Salem" <ch...@mainsequence.net>
Subject Re: searching for c++, c#, etc...
Date Thu, 16 Jul 2009 17:09:17 GMT
I figured "c++." would be a problem.  Here's what I did to get around it:
value.toLowerCase().replaceAll("\\.( ?\t?\n?\r?)+", " ")
I'm not escaping +'s from the query so I should be good there.
thanks alot.
Sincerely,
Chris Salem 
Development Team 
Main Sequence Technologies, Inc.
PCRecruiter.net - PCRecruiter Support
chris@mainsequence.net
P: 440.946.5214 ext 5458 
F: 440.856.0312

This email and any files transmitted with it may contain confidential information intended
solely for the use of the individual or entity to whom they are addressed. If you have received
this email in error please notify the sender. Please note that any views or opinions presented
in this email are solely those of the author and do not necessarily represent those of the
company. Finally, the recipient should check this email and any attachments for the presence
of viruses. The company accepts no liability for any damage caused by any virus transmitted
by this email. Main Sequence Technologies, Inc. 4420 Sherwin Rd. Willoughby OH 44094 www.pcrecruiter.net




----- Original Message ----- 
To: java-user@lucene.apache.org, Chris Salem <chris@mainsequence.net>
From: John Wang <john.wang@gmail.com>
Sent: 7/16/2009 12:09:05 PM
Subject: Re: searching for c++, c#, etc...


If you escape the character + or #, the sentence:
"I know java + c++" would not skip +, furthermore, it breaks query parsing,
where + is reserved.

-John

On Thu, Jul 16, 2009 at 9:04 AM, John Wang <john.wang@gmail.com> wrote:

> This runs into problems when you have such following sentence:
> "I dislike c++."
>
> If you use WSA, then last token is "c++.", not "c++", the query would not
> find this document.
>
> -John
>
>
> On Thu, Jul 16, 2009 at 8:29 AM, Chris Salem <chris@mainsequence.net>wrote:
>
>> That seems to be working. you don't have to escape the pluses though.
>> Also, it appears that the WhitespaceAnalyzer is case sensitive, but I guess
>> I could lowercase everything that gets indexed.
>> thanks alot for your help.
>> Sincerely,
>> Chris Salem
>> Development Team
>> Main Sequence Technologies, Inc.
>> PCRecruiter.net - PCRecruiter Support
>> chris@mainsequence.net
>> P: 440.946.5214 ext 5458
>> F: 440.856.0312
>>
>> This email and any files transmitted with it may contain confidential
>> information intended solely for the use of the individual or entity to whom
>> they are addressed. If you have received this email in error please notify
>> the sender. Please note that any views or opinions presented in this email
>> are solely those of the author and do not necessarily represent those of the
>> company. Finally, the recipient should check this email and any attachments
>> for the presence of viruses. The company accepts no liability for any damage
>> caused by any virus transmitted by this email. Main Sequence Technologies,
>> Inc. 4420 Sherwin Rd. Willoughby OH 44094 www.pcrecruiter.net
>>
>>
>>
>>
>> ----- Original Message -----
>> To: java-user@lucene.apache.org, Chris Salem <chris@mainsequence.net>
>> From: Danil TORIN <torindan@gmail.com>
>> Sent: 7/16/2009 10:28:37 AM
>> Subject: Re: searching for c++, c#, etc...
>>
>>
>> Try WhitespaceAnalyzer for both indexing and searching.
>> On search-time you may also need to escape "+", "(", ")" with "\".
>> "#" shouldn't need escaping.
>>
>> On Thu, Jul 16, 2009 at 17:23, Chris Salem<chris@mainsequence.net> wrote:
>> > I'm using the StandardAnalyzer for both searching and indexing.
>> > Here's the code to parse the query:
>> > Searcher searcher = new IndexSearcher(reader);
>> > Analyzer analyzer = new StandardAnalyzer(stopwords);
>> > System.out.println(queryString);
>> > QueryParser qp = new QueryParser(searchField,analyzer);
>> > Query query = qp.parse(queryString);
>> > queryString = query.toString();
>> > System.out.println(queryString);
>> > And here's the output from the println's:
>> > r2_resume_text:c\+\+ AND r2_resume_text: c\#
>> > +r2_resume_text:c +r2_resume_text:c
>> > Also the documentation doesn't say anything about # having to be
>> escaped.
>> > Do I have to escape during indexing too?
>> > Sincerely,
>> > Chris Salem
>> >
>> >
>> >
>> > ----- Original Message -----
>> > To: java-user@lucene.apache.org, Chris Salem <chris@mainsequence.net>
>> > From: Ian Lea <ian.lea@gmail.com>
>> > Sent: 7/16/2009 5:12:53 AM
>> > Subject: Re: searching for c++, c#, etc...
>> >
>> >
>> > Hi
>> >
>> >
>> > Escaping should work. See
>> > http://lucene.apache.org/java/2_4_1/queryparsersyntax.html and
>> > QueryParser.escape(). And you need to be sure that your analyzer
>> > isn't removing the plus signs and that you use the same analyzer for
>> > indexing and searching.
>> >
>> > Googling for something like "lucene escape" will find you more info.
>> >
>> > Luke will tell you what is actually in your index.
>> >
>> >
>> > --
>> > Ian.
>> >
>> >
>> > On Wed, Jul 15, 2009 at 5:19 PM, Chris Salem<chris@mainsequence.net>
>> wrote:
>> >> Hello,
>> >> I'm trying to search for the terms like c++ but the parser is stripping
>> off the ++. I tried escaping the ++ with slashes but it's still stripping
>> it off. I could replace + with "plus", is that the best way to do it? How
>> come escaping isn't working?
>> >> thanks
>> >> Sincerely,
>> >> Chris Salem
>> >>
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>> (The following links were included with this email:)
>> http://www.pcrecruiter.net/
>>
>> http://www.pcrecruiter.net/support.htm
>>
>> mailto:chris@mainsequence.net
>>
>>
>>
>> (The following links were included with this email:)
>> http://www.pcrecruiter.net/
>>
>> http://www.pcrecruiter.net/support.htm
>>
>> mailto:chris@mainsequence.net
>>
>>
>>
>

(The following links were included with this email:)
http://www.pcrecruiter.net/

http://www.pcrecruiter.net/support.htm

mailto:chris@mainsequence.net



(The following links were included with this email:)
http://www.pcrecruiter.net/

http://www.pcrecruiter.net/support.htm

mailto:chris@mainsequence.net



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message