lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] Updated: (SOLR-1710) convert worddelimiterfilter to new tokenstream API
Date Fri, 08 Jan 2010 22:25:54 GMT

     [ https://issues.apache.org/jira/browse/SOLR-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated SOLR-1710:
------------------------------

    Attachment: SOLR-1710.patch

for the 'wdf is only modifying single word with punctuation', don't clearAttributes() if its
the first token, even though its modified... unless preserveOriginal is on (in this case the
preserved original contained the attributes already, and we must clear).

this is a little confusing since the behavior for custom attributes depends on this preserveOriginal
value, but i think it makes sense.

> convert worddelimiterfilter to new tokenstream API
> --------------------------------------------------
>
>                 Key: SOLR-1710
>                 URL: https://issues.apache.org/jira/browse/SOLR-1710
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>            Reporter: Robert Muir
>         Attachments: SOLR-1710.patch, SOLR-1710.patch
>
>
> This one was a doozy, attached is a patch to convert it to the new tokenstream API.
> Some of the logic was split into WordDelimiterIterator (exposes a BreakIterator-like
api for iterating subwords)
> the filter is much more efficient now, no cloning.
> before applying the patch, rename the existing WordDelimiterFilter to OriginalWordDelimiterFilter
> the patch includes a testcase (TestWordDelimiterBWComp) which generates random strings
from various subword combinations.
> For each random string, it compares output against the existing WordDelimiterFilter for
all 512 combinations of boolean parameters.
> NOTE: due to bugs found (SOLR-1706), this currently only tests 256 of these combinations.
The bugs discovered in SOLR-1706 are fixed here.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message