Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2379017705 for ; Mon, 29 Sep 2014 16:51:48 +0000 (UTC) Received: (qmail 40542 invoked by uid 500); 29 Sep 2014 16:51:41 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 40488 invoked by uid 500); 29 Sep 2014 16:51:41 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 40471 invoked by uid 99); 29 Sep 2014 16:51:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Sep 2014 16:51:41 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of zhoucheng2008@gmail.com designates 74.125.82.43 as permitted sender) Received: from [74.125.82.43] (HELO mail-wg0-f43.google.com) (74.125.82.43) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Sep 2014 16:51:13 +0000 Received: by mail-wg0-f43.google.com with SMTP id a1so2142128wgh.2 for ; Mon, 29 Sep 2014 09:51:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=eQo2UYkMClJIxCGwefTyZRt/7S16RH4UHXFmdlZRECg=; b=0wgeuoO/xW9MMMN38GvEfdrkTundiHg9UgvBICfBuIiMnKC1N9Y0G0bx/MCarZFuWn Wd+yKyldH6Yll8tqwK8k0it7tBFXgBeX5OK+k1zvYorAuAy7cVUnWVftCx9BYJ4LoNYz UGjkUENu1ks05T/XPBE6xsqSGmrrb7xUgS8ug/BeP2fd5Izno7GK/WNxzQCf52nCNFzg cMmwipKrnIqdjaF77leRjTaVHAkvHPjjE6gKJ3cXC3uEX4vbNHkT6MXQ1EcGi4loSSBx liVT2wOxeJowtUo/LTSEhgLzPyZKrZYygKPfhlALsnQaeF3z5PUlRsdaqu32N4y981zK E4TQ== MIME-Version: 1.0 X-Received: by 10.180.76.100 with SMTP id j4mr15217753wiw.51.1412009471961; Mon, 29 Sep 2014 09:51:11 -0700 (PDT) Received: by 10.194.154.201 with HTTP; Mon, 29 Sep 2014 09:51:11 -0700 (PDT) Date: Tue, 30 Sep 2014 00:51:11 +0800 Message-ID: Subject: Lucene suggester can't suggest similar phrase From: Cheng To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=f46d043c7b56ff7a880504371260 X-Virus-Checked: Checked by ClamAV on apache.org --f46d043c7b56ff7a880504371260 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi, I am using Lucene 4.10 suggester which I thought can return similar phrase. But it turned out the different way. My code is as follow: public static void main(String[] args) throws IOException { String path =3D "c:/data/suggest/dic.txt"; Dictionary dic; dic =3D new FileDictionary(new FileInputStream(path)); InputIterator it =3D dic.getEntryIterator(); Analyzer analyzer =3D ERAnalyzer.getInstance().getAnalyzer(); FuzzySuggester suggester =3D new FuzzySuggester(analyzer); suggester.build(it); CharSequence cs =3D "=E9=9B=85=E8=AF=97=E5=85=B0=E9=BB=9B"; List results =3D suggester.lookup(cs, false, 1); System.out.println(results.get(0).key); } The dictionary contains only one line: =E9=9B=85=E8=AF=97=E5=85=B0=E9=BB=9B 50 When cs is exactly "=E9=9B=85=E8=AF=97=E5=85=B0=E9=BB=9B", I get the result= . But when cs is "=E9=9B=85=E6=80=9D=E5=85=B0=E9=BB=9B", which is only one word different from the target, I get nothing back. I tried FuzzySuggester as well as AnalyzingSuggester. The result is the same. Did I miss something here? Thanks! --f46d043c7b56ff7a880504371260--