lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven A Rowe" <sar...@syr.edu>
Subject RE: Re: Replacing FAST functionality atsesam.no-ShingleFilter+exactmatching
Date Wed, 10 Sep 2008 17:48:50 GMT
On 09/10/2008 at 1:17 PM, Mck wrote:
> Without phrasing the ShingleFilter is indeed invoked.
> But it is used three separate times for each term
>  1) abcd
>  2) efgh
>  3) ijkl
> So there is no shingles generated.

Ah, right, each individual token is sent through the analyzer.

> With phrasing the ShingleFilter it is used once
>  1) abcd efgh ijkl
> And so all the shingles are generated.

Wow, I don't see any alternatives to your solution.  

Your solution, on the one hand, however, is a kludge: you are disabling position information
(by assigning the same position to all tokens) in order to induce a particular behavior in
the query parser, which may change in the future.  Long term, I think this should be addressed:
there should be a query parser that will work directly with ShingleFilter, i.e., that will
pass all tokens at once to it without requiring quotes.

On the other hand, I'm not sure how useful position information is for shingles in the general
case: they already have relative position info embedded within them.  And how likely is it
that one would want to perform a phrase/span query over shingles?  Pretty unlikely, methinks.

Anyhow, I suggest you change the name of the option you're adding in LUCENE-1380 to "disablePositions",
and make it boolean -- this better describes what you're trying to do.  When true, all position
increments would be set to zero.  It should default to false.

Steve

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message