lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From karl wettin <karl.wet...@gmail.com>
Subject Re: special handling of certain terms with embedded periods
Date Thu, 09 Aug 2007 14:56:14 GMT

9 aug 2007 kl. 16.36 skrev Donna L Gresh:

> Is there a good way to handle the following scenario:
>
> I have certain terms with embedded periods for which I want to  
> leave them
> intact (not split at the periods). For example in my application a
> particular skill might be SAP.FIN (SAP financial), and it should  
> not be
> split into SAP and FIN. Is there a way to specify a list of terms  
> such as
> these which should not be split?

Updating the standard analyzer BNF to allow terms with punctuation is  
not a
big deal. If there is a list of terms you want to allow, you would  
handle
them in a TokenFilter. See StandadardTokenizer and StandardFilter.

You might save a couple of clock ticks by implementing a BNF rule rather
than a filter though.


-- 
karl

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message