Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 0AA4B2009F5 for ; Sun, 8 May 2016 20:43:45 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 09210160A06; Sun, 8 May 2016 18:43:45 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 2A8621609B2 for ; Sun, 8 May 2016 20:43:44 +0200 (CEST) Received: (qmail 68162 invoked by uid 500); 8 May 2016 18:43:42 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 68150 invoked by uid 99); 8 May 2016 18:43:41 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 08 May 2016 18:43:41 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 7FC50C25F8 for ; Sun, 8 May 2016 18:43:41 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.802 X-Spam-Level: X-Spam-Status: No, score=-0.802 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id uTs659kF9AyO for ; Sun, 8 May 2016 18:43:39 +0000 (UTC) Received: from mail-ig0-f176.google.com (mail-ig0-f176.google.com [209.85.213.176]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id B82B05F23A for ; Sun, 8 May 2016 18:43:38 +0000 (UTC) Received: by mail-ig0-f176.google.com with SMTP id s8so77082926ign.0 for ; Sun, 08 May 2016 11:43:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=/Egk8eijgEcMeUCrr4kZ8kA9S6S1yk3AFcyzNMT9Jmg=; b=aHQ1oejNreZlvF3SZLM6cgvVfO+ppao0b2mXxaFWkLwq7KAdamM5uKxwFMgeKaesTL qk5GLHNI9QvthF/wnmgVUZGV/U2AMrrcpUCCs9C2yVgoVKv8XhSr2AFkQ9JsAfG9rvv5 jGTmEvxUlL42QC8fV8K66eDXd0TYu7XtUIpAGJ22RAnqDTJcM6cd8snZ2NaPt+JUmNd7 Q/WdHG8w01gykPdCkaQGd5vMjmfiTBgYcF4gTQWY+hbc0W22hYkRD51jP8t2qATq6aiv Y2PvoBKG3UFTt5IZtN8+oW1SkidqJyyAARkPOltuNAhDIzHro3i/qK9r/UfTr7njSaOn SK6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=/Egk8eijgEcMeUCrr4kZ8kA9S6S1yk3AFcyzNMT9Jmg=; b=LARLDkOieyDvvPd+R7YX6qSBBw22dLZt0vyWWTvyC08pPDu+/tlyH79EweGapuQagY OqsI+VlR/JMotF7lDY5JBFVFQiEWXi3hq9Ag15faC7iwRpj1lZK85YJC54lpW/KtdctY 5NxQPKoVcqSdQUuYSirQzzvAJfVeEUIEoQUCDd718e25/r+xYM3Og3nb5MPLGfDdm2Ii WrE9SuOGdkQoXU9A7yQ+gQziuhQa1/GyrX3kn6shWmdGO+UqItHQdLD0gEOjMtwpPCd4 soe4x/lS3Omk+btx3CZ/gwOjszPl/kBsjiy7xEGmfby5e9kcywG3BSOo+PMgY2usAno/ 242w== X-Gm-Message-State: AOPr4FWLb0aLhhjQ3NyN4PDTsxRgZhWGeOhVJkp0pCF71TLScrQo2iGnvPmawTFrDRIR3eoM/KDJpFVpkrnGKg== X-Received: by 10.50.161.164 with SMTP id xt4mr6666435igb.97.1462733012064; Sun, 08 May 2016 11:43:32 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.12.18 with HTTP; Sun, 8 May 2016 11:43:12 -0700 (PDT) In-Reply-To: References: From: Erick Erickson Date: Sun, 8 May 2016 11:43:12 -0700 Message-ID: Subject: Re: understanding phonetic matching To: solr-user Content-Type: text/plain; charset=UTF-8 archived-at: Sun, 08 May 2016 18:43:45 -0000 Jay: Here's what's currently available: https://cwiki.apache.org/confluence/display/solr/Phonetic+Matching Not sure what version of Solr some of them were added in.... Best, Erick On Sat, May 7, 2016 at 9:30 PM, Jay Potharaju wrote: > Thanks will check it out. > > > On Sat, May 7, 2016 at 7:05 PM, Susheel Kumar wrote: > >> Jay, >> >> There are mainly three phonetics algorithms available in Solr i.e. >> RefinedSoundex, DoubleMetaphone & BeiderMorse. We did extensive comparison >> considering various tests cases and found BeiderMorse to be the best among >> those for finding sound like matches and it also supports multiple >> languages. We also customized Beider Morse extensively for our use case. >> >> So please take a closer look at Beider Morse and i am sure it will help you >> out. >> >> Thanks, >> Susheel >> >> On Sat, May 7, 2016 at 2:13 PM, Jay Potharaju >> wrote: >> >> > Thanks for the feedback, I was getting correct results when searching for >> > jon & john. But when I tried other names like 'khloe' it matched on >> > 'collier' because the phonetic filter generated KL as the token. >> > Is phonetic filter the best way to find similar sounding names? >> > >> > >> > On Wed, Mar 23, 2016 at 12:01 AM, davidphilip cherian < >> > davidphilipcherian@gmail.com> wrote: >> > >> > > The "phonetic_en" analyzer definition available in solr-schema does >> > return >> > > documents having "Jon", "JN", "John" when search term is "John". >> Checkout >> > > screen shot here : http://imgur.com/0R6SvX2 >> > > >> > > This wiki page explains how phonetic matching works : >> > > >> > > >> > >> https://cwiki.apache.org/confluence/display/solr/Phonetic+Matching#PhoneticMatching-DoubleMetaphone >> > > >> > > >> > > Hope that helps. >> > > >> > > >> > > >> > > On Wed, Mar 23, 2016 at 11:18 AM, Alexandre Rafalovitch < >> > > arafalov@gmail.com> >> > > wrote: >> > > >> > > > I'd start by putting LowerCaseFF before the PhoneticFilter. >> > > > >> > > > But then, you say you were using Analysis screen and what? Do you get >> > > > the matches when you put your sample text and the query text in the >> > > > two boxes in the UI? I am not sure what "look at my solr data" means >> > > > in this particular context. >> > > > >> > > > Regards, >> > > > Alex. >> > > > ---- >> > > > Newsletter and resources for Solr beginners and intermediates: >> > > > http://www.solr-start.com/ >> > > > >> > > > >> > > > On 23 March 2016 at 16:27, Jay Potharaju >> > wrote: >> > > > > Hi, >> > > > > I am trying to do name matching using the phonetic filter factory. >> As >> > > > part >> > > > > of that I was analyzing the data using analysis screen in solr UI. >> > If i >> > > > > search for john, any documents containing john or jon should be >> > found. >> > > > > >> > > > > Following is my definition of the custom field that I use for >> > indexing >> > > > the >> > > > > data. When I look at my solr data I dont see any similar sounding >> > names >> > > > in >> > > > > my solr data, even though I have set inject="true". Is that not how >> > it >> > > is >> > > > > supposed to work? >> > > > > Can someone explain how phonetic matching works? >> > > > > >> > > > > > > > > positionIncrementGap >> > > > > ="100"> >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > > > > > encoder="DoubleMetaphone" >> > > > > inject="true" maxCodeLength="5"/> >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > -- >> > > > > Thanks >> > > > > Jay >> > > > >> > > >> > >> > >> > >> > -- >> > Thanks >> > Jay Potharaju >> > >> > > > > -- > Thanks > Jay Potharaju