lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven A Rowe <>
Subject RE: Enabling indexing of hyphenated terms sans the hyphen
Date Mon, 19 Sep 2011 21:05:22 GMT
Hi sbs,

Solr's WordDelimiterFilterFactory does what you want.  You can see a description of its function
here: <>.

WordDelimiterFilter, the filter class implementing the above factory's functionality, is package
private in Solr 3.X, so unless you want to circumvent this access restriction (e.g. with introspection
or a with fa├žade class in the same package as the Solr filter class), you can't just depend
on the v3.2 solr-core jar, where it resides. In trunk (4.0, not yet released), WordDelimiterFilter
has been moved to the analysis-common module and made public.

You can copy/paste into your project and use it without any additional
dependencies beyond lucene-core.  Here's the source for the Lucene/Solr 3.2 version: <>.

Good luck,

> -----Original Message-----
> From: SBS []
> Sent: Monday, September 19, 2011 4:27 PM
> To:
> Subject: Enabling indexing of hyphenated terms sans the hyphen
> We use StandardTokenizer and this works well but we also need to include
> terms in our index which consist of hyphenated terms with the hyphen
> removed.  So, for example, if the text being indexed contains "self-
> induced"
> we need the terms "self", "induced" and "selfinduced" to be indexed.
> How would I go about implementing this?  We use Lucene Java 3.2.
> Thanks,
> -sbs
> --
> View this message in context:
> sans-the-hyphen-tp3350008p3350008.html
> Sent from the Lucene - Java Users mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message