lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <yo...@apache.org>
Subject Re: solr.WordDelimiterFilterFactory
Date Tue, 25 Nov 2008 02:20:02 GMT
On Thu, Nov 20, 2008 at 9:20 AM, Daniel Rosher <rosherd@googlemail.com> wrote:
> I'm trying to index some content that has things like 'java/J2EE' but with
> solr.WordDelimiterFilterFactory and parameters [generateWordParts="1"
> generateNumberParts="0" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="0"] this ends up tokenized as
> 'java','j','2',EE'
>
> Does anyone know a way of having this tokenized as 'java','j2ee'.
>
> Perhaps this filter need something like a protected list of tokens not to
> tokenize like EnglishPorterFilter ?

In addition to the other replies, you could use the SynonymFilter to
normalize certain terms before the WDF (assuming you want to keep the
WDF for other things).

Perhaps try the following synonym rules at both index and query time:

j2ee => javatwoee
java/j2ee => java javatwoee

-Yonik

Mime
View raw message