lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Miller" <>
Subject Re: Problem with a search engine
Date Mon, 05 Feb 2007 16:11:43 GMT
StandardAnalyzer does not index pure numbers. It will index alphanumeric
tokens and numbers that are connected with one of: "_"|"-"|"/"|"."|"," If
you wish to index pure numbers you might want to add another regex to
StandardAnalyzer that recognizes a series of digits - don't forget to add
the new token type to the grammar lower in the StandardTokenizer.jj file.

- Mark

On 2/5/07, Xavier To <> wrote:
> Thanks for taking time to answer me. The problem is that I'm not allowed
> to post code due to a confidentiality contract that I was required to sign.
> I'll try to see if I can get a special permission to post code since I'm
> wasting so much time trying to find the answer to this.
> I tried looking for each time the query is touched and numbers are still
> present in the query. I don't know if it's the analyzer, but if it was,
> woundl't the numbers be cut out of the index completely ? As I said in my
> 1st post, they are "findable" with Lukeall. If I read right, the
> FrenchAnalyzer included in lucene is supposed to be based on
> StandardAnalyzer so I really fail to see what is going wrong. Might it be
> the fact that the tokenizer used is Stringtokenizer and not Tokenstream ?
> The numbers are tokenized, and in the returned query they are present....
> I really don't know where they get zapped out of existence...
> Thanks again for helping.
> Xavier Tô
> Bacc. en Informatique et Génie Logiciel
> (450)434-8905
> --------------------------------------------------------------------------------------
> Hard to tell without seeing any code.  Perhaps numbers are being removed
> from the query string
> during search.
> Make sure the same or at least "compatible" Analyzer is used during both
> indexing and querying.
> Grab the code from Lucene in Action .... hm, may be down at
> the moment, but
> that's where you can get the code normally.  The code includes some
> classes that let you run
> a query string through a set of Analyzers and see how each of them behaves
> and what it does
> to a query.
> Otis
> ----- Original Message ----
> From: "To, Xavier" <>
> To:
> Sent: Wednesday, January 31, 2007 12:21:27 AM
> Subject: Problem with a search engine
> Hi, I recently started an internship and I've been asked to fix their
> search engine so numbers are searched. At first, I thought it was the
> Analyzer that wasn't working right, but we're using StandardAnalyzer and
> the numbers are indexed (I checked with Lukeall). Then I thought they
> are not tokenized during the search, but they are. They just seem to be
> ignored for some reason. Did anyone experienced something similar ? If
> so, how can I fix this ? It's probably something that would jump in my
> face if it was alive, but I just can't see it. Can anyone help me ? It
> would be very much appreciated.
> Xavier T�
> Stagiaire
> D�veloppement - Maintenance & �volution
> AXA Canada Tech
> 2020, rue University, bureau 700
> Montr�al(Qu�bec)H3A 2A5
> T�l. :  (514) 282-6817, poste 2224
> T�l�c. :  (514) 282-6017
> Courriel : <>
>   _____
> "Ce message est confidentiel, � l'usage exclusif du destinataire
> ci-dessus et son contenu ne repr�sente en aucun cas un engagement de la
> part de AXA, sauf en cas de stipulation expresse et par �crit de la part
> de AXA. Toute publication, utilisation ou diffusion, m�me partielle,
> doit �tre autoris�e pr�alablement. Si vous n'�tes pas destinataire de ce
> message, merci d'en avertir imm�diatement l'exp�diteur."
> "This e-mail message is confidential, for the exclusive use of the
> addressee and its contents shall not constitute a commitment by AXA,
> except as otherwise specifically provided in writing by AXA. Any
> unauthorized disclosure, use or dissemination, either whole or partial,
> is prohibited. If you are not the intended recipient of the message,
> please notify the sender immediately."
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message