Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 88D6A97A0 for ; Thu, 16 Feb 2012 09:03:25 +0000 (UTC) Received: (qmail 21367 invoked by uid 500); 16 Feb 2012 09:03:24 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 21285 invoked by uid 500); 16 Feb 2012 09:03:24 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 21225 invoked by uid 99); 16 Feb 2012 09:03:24 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Feb 2012 09:03:24 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Feb 2012 09:03:19 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id E39011BAF5B for ; Thu, 16 Feb 2012 09:02:59 +0000 (UTC) Date: Thu, 16 Feb 2012 09:02:59 +0000 (UTC) From: "Mauro Asprea (Issue Comment Edited) (JIRA)" To: dev@lucene.apache.org Message-ID: <2022569671.45474.1329382979933.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Issue Comment Edited] (SOLR-1279) ApostropheTokenizer MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SOLR-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209231#comment-13209231 ] Mauro Asprea edited comment on SOLR-1279 at 2/16/12 9:02 AM: ------------------------------------------------------------- I confirm this is working using the WordDelimiterFilterFactory like Robert said: {code} {code} Then using Solr Admin Analysis page I get the following: Value: McDonald's ||Indexed Term| |McDonald's| |Mc| |Donald| |s| |McDonalds| One thing: You have to be sure that no previous filters remove the trailing "'s". In my case I had the StandardFilterFactory which does remove tailing apostrophes. was (Author: brutuscat): I confirm this is working using the WordDelimiterFilterFactory like Robert said: {code} {code} The using Solr Admin Analysis page I get the following: Value: McDonald's ||Indexed Term| |McDonald's| |Mc| |Donald| |s| |McDonalds| One thing: You have to be sure that no previous filters remove the trailing "'s". In my case I had the StandardFilterFactory which does remove tailing apostrophes. > ApostropheTokenizer > ------------------- > > Key: SOLR-1279 > URL: https://issues.apache.org/jira/browse/SOLR-1279 > Project: Solr > Issue Type: New Feature > Components: Schema and Analysis > Reporter: Sergey Borisov > Priority: Minor > Fix For: 3.6, 4.0 > > Attachments: ApostropheTokenizer.zip > > > ApostropheTokenizer creates extra tokens during the analysis stage for the fields containing apostrophes. The reason for adding this is to ensure that documents that differ only by apostrophe have the same relevancy score. > For example, if the document contains string "McDonald's", it will be tokenized as "McDonald's McDonalds". This way when the search is performed against "McDonald's" or "McDonalds" will produce similar score. > This code handles up to two apostrophes in a token. > To use this tokenizer add the following line in schema.xml > > > ... > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org