lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Pimley <>
Subject Re: Indexing puncutation
Date Wed, 29 Jun 2005 09:50:07 GMT

I'm not sure how useful this reply is, but hey ;)

<aol>me too!</aol>

I do a vaguely similar thing;  I have to strip accents from characters 
such as e-acute out of both my input data and my incoming search queries 
to put them into a standard form.  I do this with a custom TokenFilter 
subclass.  I have an analyzer that includes this filter along with some 
of the standard ones (LowercaseFilter, etc).  I run the same analyzer on 
indexing and searching, which has been discussed in other posts.

My point is that I'm happy with this approach and I'd recommend you do a 
similar thing, at least as a first attempt.

Peter Pimley

Aigner, Thomas wrote:

>Hello all,
>	I am VERY new to Lucene and we are trying out Lucene to see if
>it will accomplish the vast majority of our search functions.
>	I have a question about a good way to index some of our product
>description codes.  We have description codes like 21-MA-GAB and other
>punctuation.  Our users need to be able to search for "21 MA GAB" or 
>"21-MA_GAB" or "21MAGAB".  Is the best way to accomplish this by
>creating synonyms for the 3 different ways when punctuation is in parts
>to search for? I know I can stop punctuation in the index but what about
>grouping the information together or with spaces?
>Thanks all in advance,
>To unsubscribe, e-mail:
>For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message