lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Wang <john.w...@gmail.com>
Subject Re: searching for c++, c#, etc...
Date Thu, 16 Jul 2009 16:09:05 GMT
If you escape the character + or #, the sentence:
"I know java + c++" would not skip +, furthermore, it breaks query parsing,
where + is reserved.

-John

On Thu, Jul 16, 2009 at 9:04 AM, John Wang <john.wang@gmail.com> wrote:

> This runs into problems when you have such following sentence:
> "I dislike c++."
>
> If you use WSA, then last token is "c++.", not "c++", the query would not
> find this document.
>
> -John
>
>
> On Thu, Jul 16, 2009 at 8:29 AM, Chris Salem <chris@mainsequence.net>wrote:
>
>> That seems to be working.  you don't have to escape the pluses though.
>>  Also, it appears that the WhitespaceAnalyzer is case sensitive, but I guess
>> I could lowercase everything that gets indexed.
>> thanks alot for your help.
>> Sincerely,
>> Chris Salem
>> Development Team
>> Main Sequence Technologies, Inc.
>> PCRecruiter.net - PCRecruiter Support
>> chris@mainsequence.net
>> P: 440.946.5214 ext 5458
>> F: 440.856.0312
>>
>> This email and any files transmitted with it may contain confidential
>> information intended solely for the use of the individual or entity to whom
>> they are addressed. If you have received this email in error please notify
>> the sender. Please note that any views or opinions presented in this email
>> are solely those of the author and do not necessarily represent those of the
>> company. Finally, the recipient should check this email and any attachments
>> for the presence of viruses. The company accepts no liability for any damage
>> caused by any virus transmitted by this email. Main Sequence Technologies,
>> Inc. 4420 Sherwin Rd. Willoughby OH 44094 www.pcrecruiter.net
>>
>>
>>
>>
>> ----- Original Message -----
>> To: java-user@lucene.apache.org, Chris Salem <chris@mainsequence.net>
>> From: Danil TORIN <torindan@gmail.com>
>> Sent: 7/16/2009 10:28:37 AM
>> Subject: Re: searching for c++, c#, etc...
>>
>>
>> Try WhitespaceAnalyzer for both indexing and searching.
>> On search-time you may also need to escape "+", "(", ")" with "\".
>> "#" shouldn't need escaping.
>>
>> On Thu, Jul 16, 2009 at 17:23, Chris Salem<chris@mainsequence.net> wrote:
>> > I'm using the StandardAnalyzer for both searching and indexing.
>> > Here's the code to parse the query:
>> > Searcher searcher = new IndexSearcher(reader);
>> > Analyzer analyzer = new StandardAnalyzer(stopwords);
>> > System.out.println(queryString);
>> > QueryParser qp = new QueryParser(searchField,analyzer);
>> > Query query = qp.parse(queryString);
>> > queryString = query.toString();
>> > System.out.println(queryString);
>> > And here's the output from the println's:
>> > r2_resume_text:c\+\+ AND r2_resume_text: c\#
>> > +r2_resume_text:c +r2_resume_text:c
>> > Also the documentation doesn't say anything about # having to be
>> escaped.
>> > Do I have to escape during indexing too?
>> > Sincerely,
>> > Chris Salem
>> >
>> >
>> >
>> > ----- Original Message -----
>> > To: java-user@lucene.apache.org, Chris Salem <chris@mainsequence.net>
>> > From: Ian Lea <ian.lea@gmail.com>
>> > Sent: 7/16/2009 5:12:53 AM
>> > Subject: Re: searching for c++, c#, etc...
>> >
>> >
>> > Hi
>> >
>> >
>> > Escaping should work. See
>> > http://lucene.apache.org/java/2_4_1/queryparsersyntax.html and
>> > QueryParser.escape(). And you need to be sure that your analyzer
>> > isn't removing the plus signs and that you use the same analyzer for
>> > indexing and searching.
>> >
>> > Googling for something like "lucene escape" will find you more info.
>> >
>> > Luke will tell you what is actually in your index.
>> >
>> >
>> > --
>> > Ian.
>> >
>> >
>> > On Wed, Jul 15, 2009 at 5:19 PM, Chris Salem<chris@mainsequence.net>
>> wrote:
>> >> Hello,
>> >> I'm trying to search for the terms like c++ but the parser is stripping
>> off the ++.  I tried escaping the ++ with slashes but it's still stripping
>> it off.  I could replace + with "plus", is that the best way to do it?  How
>> come escaping isn't working?
>> >> thanks
>> >> Sincerely,
>> >> Chris Salem
>> >>
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>> (The following links were included with this email:)
>> http://www.pcrecruiter.net/
>>
>> http://www.pcrecruiter.net/support.htm
>>
>> mailto:chris@mainsequence.net
>>
>>
>>
>> (The following links were included with this email:)
>> http://www.pcrecruiter.net/
>>
>> http://www.pcrecruiter.net/support.htm
>>
>> mailto:chris@mainsequence.net
>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message