lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Rowe (JIRA)" <>
Subject [jira] Commented: (LUCENE-1380) Patch for ShingleFilter.enablePositions (or PositionFilter)
Date Wed, 24 Sep 2008 16:21:44 GMT


Steven Rowe commented on LUCENE-1380:

When I wrote:
bq. 4.  You should provide a standalone test for the PositionFilter, in addition to the ShingleFilterTest

I meant that testing of PositionFilter should be separate from testing its functionality with
ShingleFilter.  Your PositionFilter tests looks at offsets, which PositionFilter doesn't affect
at all.  It is possible that PositionFilter will be used for other things than ShingleFilter.
 Hence, there should be basic test(s) that evaluate PositionFilter without ShingleFilter.

I also think a test to make sure a single instance of PositionFilter will work with multiple
documents should be added.

BTW, you don't need to delete JIRA attachments if you want to upload a new version - when
you upload a same-named file, the most recent version of the file will be colored black, and
older versions will be colored gray.  This is the conventional way Lucene uses JIRA.  It allows
people to follow the JIRA comments in the progressive versions of the patch(es).

A typo on line 66 of PositionFilterTest: 
            // end of stream so reset firstTokePositioned

> Patch for ShingleFilter.enablePositions (or PositionFilter)
> -----------------------------------------------------------
>                 Key: LUCENE-1380
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>            Reporter: Mck SembWever
>            Priority: Trivial
>         Attachments: LUCENE-1380-PositionFilter.patch, LUCENE-1380.patch, LUCENE-1380.patch
> Make it possible for *all* words and shingles to be placed at the same position, that
is for _all_ shingles (and unigrams if included) to be treated as synonyms of each other.
> Today the shingles generated are synonyms only to the first term in the shingle.
> For example the query "abcd efgh ijkl" results in:
>    ("abcd" "abcd efgh" "abcd efgh ijkl") ("efgh" efgh ijkl") ("ijkl")
> where "abcd efgh" and "abcd efgh ijkl" are synonyms of "abcd", and "efgh ijkl" is a synonym
of "efgh".
> There exists no way today to alter which token a particular shingle is a synonym for.
> This patch takes the first step in making it possible to make all shingles (and unigrams
if included) synonyms of each other.
> See for mailing list thread.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message