lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven A Rowe <sar...@syr.edu>
Subject RE: Enabling indexing of hyphenated terms sans the hyphen
Date Mon, 19 Sep 2011 21:05:22 GMT
Hi sbs,

Solr's WordDelimiterFilterFactory does what you want.  You can see a description of its function
here: <http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory>.

WordDelimiterFilter, the filter class implementing the above factory's functionality, is package
private in Solr 3.X, so unless you want to circumvent this access restriction (e.g. with introspection
or a with fa├žade class in the same package as the Solr filter class), you can't just depend
on the v3.2 solr-core jar, where it resides. In trunk (4.0, not yet released), WordDelimiterFilter
has been moved to the analysis-common module and made public.

You can copy/paste WordDelimiterFilter.java into your project and use it without any additional
dependencies beyond lucene-core.  Here's the source for the Lucene/Solr 3.2 version: <http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_2/solr/src/java/org/apache/solr/analysis/WordDelimiterFilter.java>.

Good luck,
Steve

> -----Original Message-----
> From: SBS [mailto:jturnbul@uow.edu.au]
> Sent: Monday, September 19, 2011 4:27 PM
> To: java-user@lucene.apache.org
> Subject: Enabling indexing of hyphenated terms sans the hyphen
> 
> We use StandardTokenizer and this works well but we also need to include
> terms in our index which consist of hyphenated terms with the hyphen
> removed.  So, for example, if the text being indexed contains "self-
> induced"
> we need the terms "self", "induced" and "selfinduced" to be indexed.
> 
> How would I go about implementing this?  We use Lucene Java 3.2.
> 
> Thanks,
> 
> -sbs
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Enabling-indexing-of-hyphenated-terms-
> sans-the-hyphen-tp3350008p3350008.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

Mime
View raw message