lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xavier To <>
Subject Re: Problem with a search engine
Date Mon, 05 Feb 2007 15:41:56 GMT
Thanks for taking time to answer me. The problem is that I'm not allowed to post code due to
a confidentiality contract that I was required to sign. I'll try to see if I can get a special
permission to post code since I'm wasting so much time trying to find the answer to this.

I tried looking for each time the query is touched and numbers are still present in the query.
I don't know if it's the analyzer, but if it was, woundl't the numbers be cut out of the index
completely ? As I said in my 1st post, they are "findable" with Lukeall. If I read right,
the FrenchAnalyzer included in lucene is supposed to be based on StandardAnalyzer so I really
fail to see what is going wrong. Might it be the fact that the tokenizer used is Stringtokenizer
and not Tokenstream ? The numbers are tokenized, and in the returned query they are present....

I really don't know where they get zapped out of existence... 

Thanks again for helping.

Xavier Tô
Bacc. en Informatique et Génie Logiciel


Hard to tell without seeing any code.  Perhaps numbers are being removed from the query string
during search.
Make sure the same or at least "compatible" Analyzer is used during both indexing and querying.
Grab the code from Lucene in Action .... hm, may be down at the moment, but
that's where you can get the code normally.  The code includes some classes that let you run
a query string through a set of Analyzers and see how each of them behaves and what it does
to a query.


----- Original Message ----
From: "To, Xavier" <>
Sent: Wednesday, January 31, 2007 12:21:27 AM
Subject: Problem with a search engine

Hi, I recently started an internship and I've been asked to fix their
search engine so numbers are searched. At first, I thought it was the
Analyzer that wasn't working right, but we're using StandardAnalyzer and
the numbers are indexed (I checked with Lukeall). Then I thought they
are not tokenized during the search, but they are. They just seem to be
ignored for some reason. Did anyone experienced something similar ? If
so, how can I fix this ? It's probably something that would jump in my
face if it was alive, but I just can't see it. Can anyone help me ? It
would be very much appreciated.

Xavier T�
D�veloppement - Maintenance & �volution 
AXA Canada Tech
2020, rue University, bureau 700
Montr�al(Qu�bec)H3A 2A5
T�l. :  (514) 282-6817, poste 2224
T�l�c. :  (514) 282-6017
Courriel : <>

"Ce message est confidentiel, � l'usage exclusif du destinataire
ci-dessus et son contenu ne repr�sente en aucun cas un engagement de la
part de AXA, sauf en cas de stipulation expresse et par �crit de la part
de AXA. Toute publication, utilisation ou diffusion, m�me partielle,
doit �tre autoris�e pr�alablement. Si vous n'�tes pas destinataire de ce
message, merci d'en avertir imm�diatement l'exp�diteur."

"This e-mail message is confidential, for the exclusive use of the
addressee and its contents shall not constitute a commitment by AXA,
except as otherwise specifically provided in writing by AXA. Any
unauthorized disclosure, use or dissemination, either whole or partial,
is prohibited. If you are not the intended recipient of the message,
please notify the sender immediately."

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message