lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven A Rowe <>
Subject RE: Can I just add ShingleFilter to my nalayzer used for indexing and searching
Date Tue, 21 Feb 2012 14:37:39 GMT
Hi Paul,

Lucene QueryParser splits on whitespace and then sends individual words one-by-one to be analyzed.
 All analysis components that do their work based on more than one word, including ShingleFilter
and SynonymFilter, are borked by this.  (There is a JIRA issue open for the QueryParser problem:

There is a workaround involving PositionFilter described on the Solr wiki: <>.
 Essentially, include PositionFilter after ShingleFilter in your analyzer, then wrap queries
in quotes before sending them to QueryParser.

CommonGramsFilter does the emit-only-shingles-containing-stopwords thing, but in Lucene/Solr
3.x, it's in Solr (solr-core-3.X.jar, to be exact), not Lucene; you can use it in your application
by including the solr-core jar as a dependency.  In trunk, which will be released as Lucene/Solr
4.0, CommonGramsFilter has been moved to the analyzers-common module.


> -----Original Message-----
> From: Paul Taylor []
> Sent: Tuesday, February 21, 2012 8:07 AM
> To:
> Subject: Can I just add ShingleFilter to my nalayzer used for indexing and
> searching
> Trying out ShingleFIlter and the way it is documented it implys that you
> can just add it to your anaylzer and that's it with no side-effects
> except a larger index, but I read other implying you have to modify the
> way you parse user queries, could anyone confirm/deny.
> Also is there an easy way to use a ShingleFilter only for common stop
> words, or is that pointless.
> Paul
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message