lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denis Bazhenov <dot...@gmail.com>
Subject WordBoundTokenFilter
Date Mon, 13 Jun 2011 10:06:58 GMT
Some time ago I need to tune our home grown search engine based on lucene to perform well on
product searches. Product search is search where users come with part of product name and
we should find the product.

The problem here is that users doesn't provide full model name. For instance id product model
name is "Sony PRS-A9000QF", users frequently search for "PRS 9000", "9000QF" etc.

The simple and straightforward solution to this problem is to tokenize model names on the
different character type boundary. So for "Sony PRS-A9000QF" we will have 5 terms: "sony",
"prs", "a", "9000" "qf". This solution could dramatically increase search sensitive (which
is not a good thing in a general search), but works well in a specialized indexes.

So a developed such a token filter. My question is there any interest in this solution for
the community, and does it make sense to contribute it back?
---
Denis Bazhenov <dotsid@gmail.com>






---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message