lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Rowe (JIRA)" <>
Subject [jira] Commented: (LUCENE-1380) Patch for ShingleFilter.coterminalPositionIncrement
Date Wed, 10 Sep 2008 15:08:44 GMT


Steven Rowe commented on LUCENE-1380:

As I said in the thread on java-user that spawned this issue: <>
(emphasis added):

It works because you've set all of the shingles to be at the same position - probably better
to change the one instance of .setPositionIncrement(0) to .setPositionIncrement(1) - that
way, MultiPhraseQuery will not be invoked, and the standard disjunction thing should happen.

> [W]ould a patch to ShingleFilter that offers an option
> "unigramPositionIncrement" (that defaults to 1) likely be
> accepted into trunk?

The issue is not directly related to whether a unigram is involved, but rather whether or
not _*tokens that begin at the same word*_ are given the same position.  The option thus should
be named something like "coterminalPositionIncrement".  This seems like a reasonable addition,
and a patch likely would be accepted, if it included unit tests.

You have used the option name I suggested, but have implemented it in a form that doesn't
follow the name -- in your implementation, *all* tokens are placed at the same position, not
just those that start at the same word -- and I think this form is inappropriate for the general

I'm -1 on the patch in its current form.  If rewritten to modify the position increment only
for those shingles that begin at the same word, I'd be +1 (assuming it works and is tested

> Patch for ShingleFilter.coterminalPositionIncrement
> ---------------------------------------------------
>                 Key: LUCENE-1380
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>            Reporter: Michael Semb Wever
>             Fix For: 2.4
>         Attachments: LUCENE-1380.patch
> Make it possible for *all* words and shingles to be placed at the same position.
> Default is to place each shingle at the same position as the unigram (or first shingle
if outputUnigrams=false). That is, each coterminal token has positionIncrement=1 and every
other token a positionIncrement=0. 
> This leads to a MultiPhraseQuery where at least one word/shingle must be matched from
each word/token. This is not always desired. 
> See for mailing list thread.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message