lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mck <m...@semb.wever.org>
Subject Re: Replacing FAST functionality at sesam.no - ShingleFilter+exact matching
Date Tue, 09 Sep 2008 20:38:14 GMT

> Looks to me like MultiPhraseQuery is getting in the way.  Shingles
> that begin at the same word are given the same position by
> ShingleFilter, and Solr's FieldQParserPlugin creates a
> MultiPhraseQuery when it encounters tokens in a query with the same
> position.  I think what you want is to convert queries into shingle
> disjunctions (*any* matching shingle results in a hit),  right?

Yes you're right Steve. thank you.

One way, i see now, to get the behaviour i want is to set the unigrams'
positionIncrement to zero instead of one.

For example in ShingleFilter.fillOutputBuffer(..) replacing the two
ocurrances of 
> .setPositionIncrement(1);
with
> .setPositionIncrement(0);

Then i end up with a MultiPhraseQuery with
        termArrays[0] = { list_entry_shingles:abcd
                          list_entry_shingles:abcd efgh
                          list_entry_shingles:abcd efgh ijkl 
                          list_entry_shingles:efgh
                          list_entry_shingles:efgh ijkl 
                          list_entry_shingles:ijkl }

and it works perfectly :-)

I see no way of configuring this behaviour though. 
 If it is possible and someone can say how this would be a real godsend.

Otherwise would a patch to ShingleFilter that offers an option
"unigramPositionIncrement" (that defaults to 1) likely be accepted into
trunk?

~mck

-- 
"Between two evils, I always pick the one I never tried before." Mae
West 
| semb.wever.org | sesat.no | sesam.no |

Mime
View raw message