lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: WordBoundTokenFilter
Date Mon, 13 Jun 2011 11:11:05 GMT
In Lucene trunk (will be version 4.0), all analyzers/tokenizers/tokenfilters
were moved to a new shared analyzer module. So WDF is now part of a shared
Lucene/Solr module. In 3.x, you still have to add the Solr JARS to use it.

This TokenFilter should do what you intend to do (see the Solr
documentation, where all parameters are explained):
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimit
erFilterFactory

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Em [mailto:mailformailinglists@yahoo.de]
> Sent: Monday, June 13, 2011 1:02 PM
> To: java-user@lucene.apache.org
> Subject: Re: WordBoundTokenFilter
> 
> Yes, it's part of Solr. And even in Solr there was no documentation in the
API
> - at last when I searched for it the last time.
> 
> Regards,
> Em
> 
> Am 13.06.2011 12:56, schrieb Denis Bazhenov:
> > It seems so. Interestingly I can't find any mentions of
> WordDelimiterTokenFilter using google. Is it part of Solr codebase?
> > On 13.06.2011, at 21:49, Em wrote:
> >
> >> Hi,
> >>
> >> sounds like the WordDelimiterTokenFilter from Solr, doesn't it?
> >>
> >> Regards,
> >> Em
> >>
> >> Am 13.06.2011 12:06, schrieb Denis Bazhenov:
> >>> Some time ago I need to tune our home grown search engine based on
> lucene to perform well on product searches. Product search is search where
> users come with part of product name and we should find the product.
> >>>
> >>> The problem here is that users doesn't provide full model name. For
> instance id product model name is "Sony PRS-A9000QF", users frequently
> search for "PRS 9000", "9000QF" etc.
> >>>
> >>> The simple and straightforward solution to this problem is to tokenize
> model names on the different character type boundary. So for "Sony PRS-
> A9000QF" we will have 5 terms: "sony", "prs", "a", "9000" "qf". This
solution
> could dramatically increase search sensitive (which is not a good thing in
a
> general search), but works well in a specialized indexes.
> >>>
> >>> So a developed such a token filter. My question is there any interest
in
> this solution for the community, and does it make sense to contribute it
> back?
> >>> ---
> >>> Denis Bazhenov <dotsid@gmail.com>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> --------------------------------------------------------------------
> >>> - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >
> > ---
> > Denis Bazhenov <dotsid@gmail.com>
> >
> >
> >
> >
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message