lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wettin (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1380) Patch for ShingleFilter.enablePositions
Date Sun, 14 Sep 2008 12:49:44 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630884#action_12630884
] 

Karl Wettin commented on LUCENE-1380:
-------------------------------------

bq. Ok. So there's no way to do it through configuration only.

In Solr? Well, I don't really do Solr but I'm pretty sure all you have to do is to create
the filter as a new class, add it to the class path and add it as a filter to the query analyzer
in your configuration.

bq. Would a patch with such a TokenFilter be useful for anybody else other than ShingleFilter
users? 

I'd say no, that it only seems to make sense for shingles at query parsing time.

bq. Again i'm a newbie here but i suspect there's no other filter (yet) which works across
the tokens (and hence breaks down the importance of positionIncrement) within a query in the
way ShingleFilter does.

I don't understand what you say here. All this patch does is to set all position increment
of the tokens produced by the ShingleFilter to 0, right? 

I'm going to remove this for 2.4 fix and recommend you to use the filter strategy mentioned.
I'll leave the issue open for discussion though.

> Patch for ShingleFilter.enablePositions
> ---------------------------------------
>
>                 Key: LUCENE-1380
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1380
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>            Reporter: Michael Semb Wever
>            Assignee: Karl Wettin
>         Attachments: LUCENE-1380.patch, LUCENE-1380.patch
>
>
> Make it possible for *all* words and shingles to be placed at the same position.
> Default is to place each shingle at the same position as the unigram (or first shingle
if outputUnigrams=false). That is, each coterminal token has positionIncrement=1 and every
other token a positionIncrement=0. 
> This leads to a MultiPhraseQuery where at least one word/shingle must be matched from
each word/token. This is not always desired. 
> See http://comments.gmane.org/gmane.comp.jakarta.lucene.user/34746 for mailing list thread.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message