lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mck <m...@semb.wever.org>
Subject RE: Re: Replacing FAST functionality atsesam.no-ShingleFilter+exactmatching
Date Mon, 15 Sep 2008 09:56:29 GMT
Steve,
> Your solution, on the one hand, however, is a kludge: you are
> disabling position information (by assigning the same position to all
> tokens) in order to induce a particular behavior in the query parser,
> which may change in the future.

I disagree.

I'm not disabling position information to induce particular behaviour in
the query parser.

I'm intentionally setting position information to zero as I wish _all_
shingles and unigrams to be synonyms of each other.

The query parser expects you to assign positionIncrement=0 for synonyms
in this manner.

The one kludge i see is that the QueryParser expects the total positions
found to be greater than or equal to one. It might not be intentionally
dealing with the total position count being zero. But the situation
where you have many synonyms is the same as having one token and it
having many synonyms, so positionCount=0 == positionCount=1.

I would think that both should lead to a BooleanQuery being constructed
by the QueryParser. (But the synonyms generated by the ShingleFilter are
in fact phrases so maybe it is wiser to use the MultiPhraseQuery.)

So all in all the QueryParser is behaving exactly as i would expect it
to.
The only logic being induced is setting positionIncrement=0 to indicate
the token is a synonym of the previous token, and this logic is being
completely encapsulated to the ShingleFilter.

~mck

ps i cross-posted as i thought this was better for the dev list but am
not sure.

-- 
"Enlightenment is your ego's biggest disappointment." Yoginanda 
| semb.wever.org | sesat.no | sesam.no |

Mime
View raw message