lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Re : Re: Problem with a search engine
Date Mon, 05 Feb 2007 17:37:27 GMT
Have you tried looking at the actual query submitted with Query.toString()?
That might give you an insight into what is actually being submitted to
Lucene and a place to start.

Also be aware that QueryParser, the default operator is OR which can produce
unexpected results if you assume AND.

Best
Erick

On 2/5/07, Xavier To <to.xavier@courrier.uqam.ca> wrote:
>
> Thanks for your help,
>
> As I stated before, the numbers, whether pure or not, are indexed, for I
> can search them with luke. But supposing what you're saying was the case,
> the search for "10-year" should return 4 items (according to the number of
> occurence found by luke). Problem is that the number of documents returned
> is 6, for it ignored the "10" and searched for "-year".
>
> Xavier Tô
> Bacc. en Informatique et Génie Logiciel
> to.xavier@courrier.uqam.ca
> (450)434-8905
>
> ----- Message d'origine -----
> De: Mark Miller <markrmiller@gmail.com>
> Date: Lundi, Février 5, 2007 11:11 am
> Objet: Re: Problem with a search engine
>
> > StandardAnalyzer does not index pure numbers. It will index
> > alphanumerictokens and numbers that are connected with one of:
> > "_"|"-"|"/"|"."|"," If
> > you wish to index pure numbers you might want to add another regex to
> > StandardAnalyzer that recognizes a series of digits - don't forget
> > to add
> > the new token type to the grammar lower in the StandardTokenizer.jj
> > file.
> > - Mark
> >
> > On 2/5/07, Xavier To <to.xavier@courrier.uqam.ca> wrote:
> > >
> > > Thanks for taking time to answer me. The problem is that I'm not
> > allowed> to post code due to a confidentiality contract that I was
> > required to sign.
> > > I'll try to see if I can get a special permission to post code
> > since I'm
> > > wasting so much time trying to find the answer to this.
> > >
> > > I tried looking for each time the query is touched and numbers
> > are still
> > > present in the query. I don't know if it's the analyzer, but if
> > it was,
> > > woundl't the numbers be cut out of the index completely ? As I
> > said in my
> > > 1st post, they are "findable" with Lukeall. If I read right, the
> > > FrenchAnalyzer included in lucene is supposed to be based on
> > > StandardAnalyzer so I really fail to see what is going wrong.
> > Might it be
> > > the fact that the tokenizer used is Stringtokenizer and not
> > Tokenstream ?
> > > The numbers are tokenized, and in the returned query they are
> > present....>
> > > I really don't know where they get zapped out of existence...
> > >
> > > Thanks again for helping.
> > >
> > > Xavier Tô
> > > Bacc. en Informatique et Génie Logiciel
> > > to.xavier@courrier.uqam.ca
> > > (450)434-8905
> > >
> > >
> > > ------------------------------------------------------------------
> > --------------------
> > >
> > > Hard to tell without seeing any code.  Perhaps numbers are being
> > removed> from the query string
> > > during search.
> > > Make sure the same or at least "compatible" Analyzer is used
> > during both
> > > indexing and querying.
> > > Grab the code from Lucene in Action .... hm, lucenebook.com may
> > be down at
> > > the moment, but
> > > that's where you can get the code normally.  The code includes some
> > > classes that let you run
> > > a query string through a set of Analyzers and see how each of
> > them behaves
> > > and what it does
> > > to a query.
> > >
> > > Otis
> > >
> > > ----- Original Message ----
> > > From: "To, Xavier" <Xavier.To@axa-canada.com>
> > > To: java-user@lucene.apache.org
> > > Sent: Wednesday, January 31, 2007 12:21:27 AM
> > > Subject: Problem with a search engine
> > >
> > >
> > > Hi, I recently started an internship and I've been asked to fix
> > their> search engine so numbers are searched. At first, I thought
> > it was the
> > > Analyzer that wasn't working right, but we're using
> > StandardAnalyzer and
> > > the numbers are indexed (I checked with Lukeall). Then I thought
> > they> are not tokenized during the search, but they are. They just
> > seem to be
> > > ignored for some reason. Did anyone experienced something similar
> > ? If
> > > so, how can I fix this ? It's probably something that would jump
> > in my
> > > face if it was alive, but I just can't see it. Can anyone help me
> > ? It
> > > would be very much appreciated.
> > >
> > >
> > > Xavier T�
> > > Stagiaire
> > > D�veloppement - Maintenance & �volution
> > > AXA Canada Tech
> > > 2020, rue University, bureau 700
> > > Montr�al(Qu�bec)H3A 2A5
> > > T�l. :  (514) 282-6817, poste 2224
> > > T�l�c. :  (514) 282-6017
> > > Courriel : Xavier.To@axa-canada.com <Xavier.To@axa-canada.com>
> > >   _____
> > >
> > > "Ce message est confidentiel, � l'usage exclusif du destinataire
> > > ci-dessus et son contenu ne repr�sente en aucun cas un engagement
> > de la
> > > part de AXA, sauf en cas de stipulation expresse et par �crit de
> > la part
> > > de AXA. Toute publication, utilisation ou diffusion, m�me partielle,
> > > doit �tre autoris�e pr�alablement. Si vous n'�tes pas
> > destinataire de ce
> > > message, merci d'en avertir imm�diatement l'exp�diteur."
> > >
> > > "This e-mail message is confidential, for the exclusive use of the
> > > addressee and its contents shall not constitute a commitment by AXA,
> > > except as otherwise specifically provided in writing by AXA. Any
> > > unauthorized disclosure, use or dissemination, either whole or
> > partial,> is prohibited. If you are not the intended recipient of
> > the message,
> > > please notify the sender immediately."
> > >
> > > ------------------------------------------------------------------
> > ---
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> > >
> > >
> > > ------------------------------------------------------------------
> > ---
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message