lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xavier To <to.xav...@courrier.uqam.ca>
Subject Re : Re: Problem with a search engine
Date Mon, 05 Feb 2007 16:32:02 GMT
Thanks for your help,

As I stated before, the numbers, whether pure or not, are indexed, for I can search them with
luke. But supposing what you're saying was the case, the search for "10-year" should return
4 items (according to the number of occurence found by luke). Problem is that the number of
documents returned is 6, for it ignored the "10" and searched for "-year".

Xavier Tô
Bacc. en Informatique et Génie Logiciel
to.xavier@courrier.uqam.ca
(450)434-8905

----- Message d'origine -----
De: Mark Miller <markrmiller@gmail.com>
Date: Lundi, Février 5, 2007 11:11 am
Objet: Re: Problem with a search engine

> StandardAnalyzer does not index pure numbers. It will index 
> alphanumerictokens and numbers that are connected with one of: 
> "_"|"-"|"/"|"."|"," If
> you wish to index pure numbers you might want to add another regex to
> StandardAnalyzer that recognizes a series of digits - don't forget 
> to add
> the new token type to the grammar lower in the StandardTokenizer.jj 
> file.
> - Mark
> 
> On 2/5/07, Xavier To <to.xavier@courrier.uqam.ca> wrote:
> >
> > Thanks for taking time to answer me. The problem is that I'm not 
> allowed> to post code due to a confidentiality contract that I was 
> required to sign.
> > I'll try to see if I can get a special permission to post code 
> since I'm
> > wasting so much time trying to find the answer to this.
> >
> > I tried looking for each time the query is touched and numbers 
> are still
> > present in the query. I don't know if it's the analyzer, but if 
> it was,
> > woundl't the numbers be cut out of the index completely ? As I 
> said in my
> > 1st post, they are "findable" with Lukeall. If I read right, the
> > FrenchAnalyzer included in lucene is supposed to be based on
> > StandardAnalyzer so I really fail to see what is going wrong. 
> Might it be
> > the fact that the tokenizer used is Stringtokenizer and not 
> Tokenstream ?
> > The numbers are tokenized, and in the returned query they are 
> present....>
> > I really don't know where they get zapped out of existence...
> >
> > Thanks again for helping.
> >
> > Xavier Tô
> > Bacc. en Informatique et Génie Logiciel
> > to.xavier@courrier.uqam.ca
> > (450)434-8905
> >
> >
> > ------------------------------------------------------------------
> --------------------
> >
> > Hard to tell without seeing any code.  Perhaps numbers are being 
> removed> from the query string
> > during search.
> > Make sure the same or at least "compatible" Analyzer is used 
> during both
> > indexing and querying.
> > Grab the code from Lucene in Action .... hm, lucenebook.com may 
> be down at
> > the moment, but
> > that's where you can get the code normally.  The code includes some
> > classes that let you run
> > a query string through a set of Analyzers and see how each of 
> them behaves
> > and what it does
> > to a query.
> >
> > Otis
> >
> > ----- Original Message ----
> > From: "To, Xavier" <Xavier.To@axa-canada.com>
> > To: java-user@lucene.apache.org
> > Sent: Wednesday, January 31, 2007 12:21:27 AM
> > Subject: Problem with a search engine
> >
> >
> > Hi, I recently started an internship and I've been asked to fix 
> their> search engine so numbers are searched. At first, I thought 
> it was the
> > Analyzer that wasn't working right, but we're using 
> StandardAnalyzer and
> > the numbers are indexed (I checked with Lukeall). Then I thought 
> they> are not tokenized during the search, but they are. They just 
> seem to be
> > ignored for some reason. Did anyone experienced something similar 
> ? If
> > so, how can I fix this ? It's probably something that would jump 
> in my
> > face if it was alive, but I just can't see it. Can anyone help me 
> ? It
> > would be very much appreciated.
> >
> >
> > Xavier T�
> > Stagiaire
> > D�veloppement - Maintenance & �volution
> > AXA Canada Tech
> > 2020, rue University, bureau 700
> > Montr�al(Qu�bec)H3A 2A5
> > T�l. :  (514) 282-6817, poste 2224
> > T�l�c. :  (514) 282-6017
> > Courriel : Xavier.To@axa-canada.com <Xavier.To@axa-canada.com>
> >   _____
> >
> > "Ce message est confidentiel, � l'usage exclusif du destinataire
> > ci-dessus et son contenu ne repr�sente en aucun cas un engagement 
> de la
> > part de AXA, sauf en cas de stipulation expresse et par �crit de 
> la part
> > de AXA. Toute publication, utilisation ou diffusion, m�me partielle,
> > doit �tre autoris�e pr�alablement. Si vous n'�tes pas 
> destinataire de ce
> > message, merci d'en avertir imm�diatement l'exp�diteur."
> >
> > "This e-mail message is confidential, for the exclusive use of the
> > addressee and its contents shall not constitute a commitment by AXA,
> > except as otherwise specifically provided in writing by AXA. Any
> > unauthorized disclosure, use or dissemination, either whole or 
> partial,> is prohibited. If you are not the intended recipient of 
> the message,
> > please notify the sender immediately."
> >
> > ------------------------------------------------------------------
> ---
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
> >
> > ------------------------------------------------------------------
> ---
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message