lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Wang <john.w...@gmail.com>
Subject Re: searching for c++, c#, etc...
Date Thu, 16 Jul 2009 16:04:26 GMT
This runs into problems when you have such following sentence:
"I dislike c++."

If you use WSA, then last token is "c++.", not "c++", the query would not
find this document.

-John

On Thu, Jul 16, 2009 at 8:29 AM, Chris Salem <chris@mainsequence.net> wrote:

> That seems to be working.  you don't have to escape the pluses though.
>  Also, it appears that the WhitespaceAnalyzer is case sensitive, but I guess
> I could lowercase everything that gets indexed.
> thanks alot for your help.
> Sincerely,
> Chris Salem
> Development Team
> Main Sequence Technologies, Inc.
> PCRecruiter.net - PCRecruiter Support
> chris@mainsequence.net
> P: 440.946.5214 ext 5458
> F: 440.856.0312
>
> This email and any files transmitted with it may contain confidential
> information intended solely for the use of the individual or entity to whom
> they are addressed. If you have received this email in error please notify
> the sender. Please note that any views or opinions presented in this email
> are solely those of the author and do not necessarily represent those of the
> company. Finally, the recipient should check this email and any attachments
> for the presence of viruses. The company accepts no liability for any damage
> caused by any virus transmitted by this email. Main Sequence Technologies,
> Inc. 4420 Sherwin Rd. Willoughby OH 44094 www.pcrecruiter.net
>
>
>
>
> ----- Original Message -----
> To: java-user@lucene.apache.org, Chris Salem <chris@mainsequence.net>
> From: Danil TORIN <torindan@gmail.com>
> Sent: 7/16/2009 10:28:37 AM
> Subject: Re: searching for c++, c#, etc...
>
>
> Try WhitespaceAnalyzer for both indexing and searching.
> On search-time you may also need to escape "+", "(", ")" with "\".
> "#" shouldn't need escaping.
>
> On Thu, Jul 16, 2009 at 17:23, Chris Salem<chris@mainsequence.net> wrote:
> > I'm using the StandardAnalyzer for both searching and indexing.
> > Here's the code to parse the query:
> > Searcher searcher = new IndexSearcher(reader);
> > Analyzer analyzer = new StandardAnalyzer(stopwords);
> > System.out.println(queryString);
> > QueryParser qp = new QueryParser(searchField,analyzer);
> > Query query = qp.parse(queryString);
> > queryString = query.toString();
> > System.out.println(queryString);
> > And here's the output from the println's:
> > r2_resume_text:c\+\+ AND r2_resume_text: c\#
> > +r2_resume_text:c +r2_resume_text:c
> > Also the documentation doesn't say anything about # having to be escaped.
> > Do I have to escape during indexing too?
> > Sincerely,
> > Chris Salem
> >
> >
> >
> > ----- Original Message -----
> > To: java-user@lucene.apache.org, Chris Salem <chris@mainsequence.net>
> > From: Ian Lea <ian.lea@gmail.com>
> > Sent: 7/16/2009 5:12:53 AM
> > Subject: Re: searching for c++, c#, etc...
> >
> >
> > Hi
> >
> >
> > Escaping should work. See
> > http://lucene.apache.org/java/2_4_1/queryparsersyntax.html and
> > QueryParser.escape(). And you need to be sure that your analyzer
> > isn't removing the plus signs and that you use the same analyzer for
> > indexing and searching.
> >
> > Googling for something like "lucene escape" will find you more info.
> >
> > Luke will tell you what is actually in your index.
> >
> >
> > --
> > Ian.
> >
> >
> > On Wed, Jul 15, 2009 at 5:19 PM, Chris Salem<chris@mainsequence.net>
> wrote:
> >> Hello,
> >> I'm trying to search for the terms like c++ but the parser is stripping
> off the ++.  I tried escaping the ++ with slashes but it's still stripping
> it off.  I could replace + with "plus", is that the best way to do it?  How
> come escaping isn't working?
> >> thanks
> >> Sincerely,
> >> Chris Salem
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> (The following links were included with this email:)
> http://www.pcrecruiter.net/
>
> http://www.pcrecruiter.net/support.htm
>
> mailto:chris@mainsequence.net
>
>
>
> (The following links were included with this email:)
> http://www.pcrecruiter.net/
>
> http://www.pcrecruiter.net/support.htm
>
> mailto:chris@mainsequence.net
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message