lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] Commented: (SOLR-1869) RemoveDuplicatesTokenFilter doest have expected behaviour
Date Thu, 08 Apr 2010 16:35:36 GMT


Robert Muir commented on SOLR-1869:

bq. this all started because the highlighter was highlighting a term at the same offsets twice,

Perhaps we should fix this directly in DefaultSolrHighlighter? It already has this TokenStream-sorting
filter thats intended to do the following:
/** Orders Tokens in a window first by their startOffset ascending.
 * endOffset is currently ignored.
 * This is meant to work around fickleness in the highlighter only.  It
 * can mess up token positions and should not be used for indexing or querying.

Maybe the deduplication logic should occur here after it sorts on startOffset? 

> RemoveDuplicatesTokenFilter doest have expected behaviour
> ---------------------------------------------------------
>                 Key: SOLR-1869
>                 URL:
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis
>            Reporter: Joe Calderon
>            Priority: Minor
>         Attachments:,,
> the RemoveDuplicatesTokenFilter seems broken as it initializes its map and attributes
at the class level and not within its constructor
> in addition i would think the expected behaviour would be to remove identical terms with
the same offset positions, instead it looks like it removes duplicates based on position increment
which wont work when using it after something like the edgengram filter. when i posted this
to the mailing list even erik hatcher seemed to think thats what this filter was supposed
to do...
> attaching a patch that has the expected behaviour and initializes variables in constructor

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message