lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Otis Gospodnetic (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-1279) ApostropheTokenizer
Date Thu, 16 Jul 2009 15:47:14 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731998#action_12731998
] 

Otis Gospodnetic commented on SOLR-1279:
----------------------------------------

Boris, please let us know if WordDelimiterFilter works for you.
If it does not and this new code is needed, could you please:
* add the ASL to the top
* write a bit of javadoc (your description from this issue is good)
* write a unit test

Thanks for your help!

> ApostropheTokenizer
> -------------------
>
>                 Key: SOLR-1279
>                 URL: https://issues.apache.org/jira/browse/SOLR-1279
>             Project: Solr
>          Issue Type: New Feature
>          Components: Analysis
>            Reporter: Sergey Borisov
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: ApostropheTokenizer.zip
>
>
> ApostropheTokenizer creates extra tokens during the analysis stage for the fields containing
apostrophes. The reason for adding this is to ensure that documents that differ only by apostrophe
have the same relevancy score. 
> For example, if the document contains string "McDonald's", it will be tokenized as "McDonald's
McDonalds". This way when the search is performed against "McDonald's" or "McDonalds" will
produce similar score.
> This code handles up to two apostrophes in a token.
> To use this tokenizer add the following line in schema.xml
> <analyzer type="index">
>       <filter class="org.apache.lucene.analysis.ApostropheTokenFactory"/>
> ...
> </analyzer>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message