lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Rowe (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (LUCENE-1380) Patch for ShingleFilter.enablePositions
Date Mon, 22 Sep 2008 15:56:44 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633355#action_12633355
] 

steve_rowe edited comment on LUCENE-1380 at 9/22/08 8:56 AM:
--------------------------------------------------------------

{quote}
bq. separate out this feature to a new filter that modify the position increment. 

There may be other terms in the query outside of the quotes which should not be treated as
synonyms to the shingles.
{quote}

but they won't be in the same field, right?  Solr has per-field analysis facilities.

bq. And it was also mentioned that there were known bugs when the first token had positionIncrement=0
(or all tokens lay at position zero instead of at position one).

You can tell the filter to set posincr=1 for the first token.

When it receives null from its predecessor in the filter chain, it can reset its "at the beginning"
flag, and the next time it's used, it'll give posincr=1 for the first token again.

      was (Author: steve_rowe):
    {qoute}
bq. separate out this feature to a new filter that modify the position increment. 

There may be other terms in the query outside of the quotes which should not be treated as
synonyms to the shingles.
{quote}

but they won't be in the same field, right?  Solr has per-field analysis facilities.

bq. And it was also mentioned that there were known bugs when the first token had positionIncrement=0
(or all tokens lay at position zero instead of at position one).

You can tell the filter to set posincr=1 for the first token.

When it receives null from its predecessor in the filter chain, it can reset its "at the beginning"
flag, and the next time it's used, it'll give posincr=1 for the first token again.
  
> Patch for ShingleFilter.enablePositions
> ---------------------------------------
>
>                 Key: LUCENE-1380
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1380
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>            Reporter: Mck SembWever
>            Priority: Trivial
>         Attachments: LUCENE-1380.patch, LUCENE-1380.patch
>
>
> Make it possible for *all* words and shingles to be placed at the same position, that
is for _all_ shingles (and unigrams if included) to be treated as synonyms of each other.
> Today the shingles generated are synonyms only to the first term in the shingle.
> For example the query "abcd efgh ijkl" results in:
>    ("abcd" "abcd efgh" "abcd efgh ijkl") ("efgh" efgh ijkl") ("ijkl")
> where "abcd efgh" and "abcd efgh ijkl" are synonyms of "abcd", and "efgh ijkl" is a synonym
of "efgh".
> There exists no way today to alter which token a particular shingle is a synonym for.
> This patch takes the first step in making it possible to make all shingles (and unigrams
if included) synonyms of each other.
> See http://comments.gmane.org/gmane.comp.jakarta.lucene.user/34746 for mailing list thread.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message