lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Boris Vitez (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-41) PATCH: HyphenatedWordsFilter, Factory and test
Date Fri, 28 Jul 2006 18:32:15 GMT
    [ http://issues.apache.org/jira/browse/SOLR-41?page=comments#action_12424145 ] 
            
Boris Vitez commented on SOLR-41:
---------------------------------

Thank you for the feedback and suggestion.
I will change the Filter to use this new feature of Token class as soon as I'm back - on Monday.

> PATCH: HyphenatedWordsFilter, Factory and test
> ----------------------------------------------
>
>                 Key: SOLR-41
>                 URL: http://issues.apache.org/jira/browse/SOLR-41
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Boris Vitez
>            Priority: Minor
>         Attachments: HyphenatedWordsFilter.java, hyphenatedwordsfilter.patch, HyphenatedWordsFilterFactory.java,
TestHyphenatedWordsFilter.java
>
>
> When the plain text is extracted from documents, we will often have many words hyphenated
and broken into two lines. This is often the case with documents where narrow text columns
are used, such as newsletters.
> In order to increase searching efficiency, this filter unites hyphenated words broken
in two lines.
> This filter has to be used together with the WordDelimiterFilter having catenateWords=1.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message