lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven A Rowe <sar...@syr.edu>
Subject RE: Can I just add ShingleFilter to my nalayzer used for indexing and searching
Date Tue, 21 Feb 2012 14:37:39 GMT
Hi Paul,

Lucene QueryParser splits on whitespace and then sends individual words one-by-one to be analyzed.
 All analysis components that do their work based on more than one word, including ShingleFilter
and SynonymFilter, are borked by this.  (There is a JIRA issue open for the QueryParser problem:
<https://issues.apache.org/jira/browse/LUCENE-2605>).  

There is a workaround involving PositionFilter described on the Solr wiki: <http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory>.
 Essentially, include PositionFilter after ShingleFilter in your analyzer, then wrap queries
in quotes before sending them to QueryParser.

CommonGramsFilter does the emit-only-shingles-containing-stopwords thing, but in Lucene/Solr
3.x, it's in Solr (solr-core-3.X.jar, to be exact), not Lucene; you can use it in your application
by including the solr-core jar as a dependency.  In trunk, which will be released as Lucene/Solr
4.0, CommonGramsFilter has been moved to the analyzers-common module.

Steve

> -----Original Message-----
> From: Paul Taylor [mailto:paul_t100@fastmail.fm]
> Sent: Tuesday, February 21, 2012 8:07 AM
> To: java-user@lucene.apache.org
> Subject: Can I just add ShingleFilter to my nalayzer used for indexing and
> searching
> 
> Trying out ShingleFIlter and the way it is documented it implys that you
> can just add it to your anaylzer and that's it with no side-effects
> except a larger index, but I read other implying you have to modify the
> way you parse user queries, could anyone confirm/deny.
> 
> Also is there an easy way to use a ShingleFilter only for common stop
> words, or is that pointless.
> 
> Paul
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

Mime
View raw message