lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Semb Wever (JIRA)" <>
Subject [jira] Commented: (LUCENE-1380) Patch for ShingleFilter.enablePositions
Date Sun, 14 Sep 2008 12:57:46 GMT


Michael Semb Wever commented on LUCENE-1380:

> All this patch does is to set all position increment of the tokens produced by the ShingleFilter
to 0, right? 
> I'm going to remove this for 2.4 fix and recommend you to use the filter strategy mentioned.

The patch to add the new TokenFilter isn't easy-as-abc as lucene needs to have the filter
class added to classpath, and Solr needs the TokenFilterFactory added to be able to read it
from the configuration files. A lot of work when we're (almost) agreed that removing positional
information from all tokens makes sense when using the ShingleFilter.

If it were just the one installation i wouldn't have a problem with adding the custom TokenFilter,
but because our use-case is an open sourced and documented system ( read
) i'd like to make it as easy as possible for third parties.

I would also think that this is a way to replace commercial and competing technology from
FAST that the community would be behind such an enhancement...

> Patch for ShingleFilter.enablePositions
> ---------------------------------------
>                 Key: LUCENE-1380
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>            Reporter: Michael Semb Wever
>            Assignee: Karl Wettin
>            Priority: Trivial
>         Attachments: LUCENE-1380.patch, LUCENE-1380.patch
> Make it possible for *all* words and shingles to be placed at the same position.
> Default is to place each shingle at the same position as the unigram (or first shingle
if outputUnigrams=false). That is, each coterminal token has positionIncrement=1 and every
other token a positionIncrement=0. 
> This leads to a MultiPhraseQuery where at least one word/shingle must be matched from
each word/token. This is not always desired. 
> See for mailing list thread.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message