Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E7EB5200D24 for ; Tue, 24 Oct 2017 14:17:25 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id E66AC160BE0; Tue, 24 Oct 2017 12:17:25 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 38329160BDB for ; Tue, 24 Oct 2017 14:17:25 +0200 (CEST) Received: (qmail 16163 invoked by uid 500); 24 Oct 2017 12:17:23 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 16151 invoked by uid 99); 24 Oct 2017 12:17:23 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Oct 2017 12:17:23 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id D5F021806FC for ; Tue, 24 Oct 2017 12:17:22 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.151 X-Spam-Level: X-Spam-Status: No, score=-0.151 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-2.8, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id JZKb6IIkKpYX for ; Tue, 24 Oct 2017 12:17:21 +0000 (UTC) Received: from mail-oi0-f47.google.com (mail-oi0-f47.google.com [209.85.218.47]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id DDF2D5FC9D for ; Tue, 24 Oct 2017 12:17:20 +0000 (UTC) Received: by mail-oi0-f47.google.com with SMTP id m198so36430999oig.5 for ; Tue, 24 Oct 2017 05:17:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=BKW6eqxbb58B/bNELVUZbwUP/znpVlGI9lVPkFXGHMI=; b=gq+86dV0CMGe62qs5AW/BeGvTYwv0IuQvB/pGkL6A2Q+WCaCai9Q1gLmHtUzQBrdXX s6+4XcvP/oSsL/ZrmGh2qyQW8QbB6N9hopInkaTTIcaUcuv75dYDKg6zrMnXOhcWW3m7 hZ3drKj1WfY8UN9AJLPJEXnTl9SAIQZ56KwH+PjQaQsgrW+3+yCI1Pd9i+xR4/2/kUo5 GZ4xZLrr35lyzfyUqS2C38wyD4CAPSlzSInHRftf1fi2YscA7t4MB1C4LXwymbajpG1c m1NGvjZyy7xbpkFewpVRLAoT49jTGywDSQFl6fypaM/P6JtIHgzsaQSt/38SXxUtCBhp sWMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=BKW6eqxbb58B/bNELVUZbwUP/znpVlGI9lVPkFXGHMI=; b=tRcam0yYX65vZ6hiC5MmbEuhVo8NBPM3oYBI9VjVifXNp7w8JRjRfozsWRvcJ0A6q7 EKw8h6CatDLxhdc9/c6N40uyI4EruMMqPKkLcnN8VPF1LrpAJj1g3QmO0T8hqqX0dKLW PVNsrL0wa2683nA3zzE1irPFlYeSW9cqTbcqiSn4qSb7zmGPXeKH7cxvLUv6JZirn3ro UmUcD8VErB6KcSUXJY3uJorm+X1DerBSChkUpDWymCiBFJqAwQIjwOIk1nStm/SCMOIu r6ppX/+HzTQsLPMgvEzZtqlSQYwuOCpf17ZBVg628apUtmF56/ZxXCHP8wf2EoUj2zOK m80Q== X-Gm-Message-State: AMCzsaWALefYYDsWdBzHhb3EMMfY8L94NWimfn64tlPg2kN990Bzq0qX sFAnWVYc0Qyq9e5TS4FfP7tuOHzZPidvybKKqjbVCg== X-Google-Smtp-Source: ABhQp+QZMwxSDGvhslgOE5y2jXJPUZLAH07cjk7b92rA8TN79k929y/Q5BJHexSIlwkjNCkbojjOAKioEIrPdL6yl+Q= X-Received: by 10.157.91.57 with SMTP id x54mr9628415oth.146.1508847439226; Tue, 24 Oct 2017 05:17:19 -0700 (PDT) MIME-Version: 1.0 Received: by 10.74.35.81 with HTTP; Tue, 24 Oct 2017 05:16:58 -0700 (PDT) In-Reply-To: References: <205847485.449706.1506514653255@mail.yahoo.com> <974674c4-192c-f1c4-46e1-4f7df49f852a@rondhuit.com> From: Chitra Date: Tue, 24 Oct 2017 17:46:58 +0530 Message-ID: Subject: Re: Accent insensitive search for greek characters To: Lucene Users Content-Type: multipart/alternative; boundary="94eb2c1c1742a3224f055c49eb51" archived-at: Tue, 24 Oct 2017 12:17:26 -0000 --94eb2c1c1742a3224f055c49eb51 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi, ICUTransformFilter is working fine for greek characters alone as per requirement. but one case it's breaking( =CF=83 & =CF=82 are t= he lower forms of =CE=A3 Sigma). *Example:* I indexed the terms =CF=80=CE=B5=CE=BB=CE=AC=CF=84=CE=B7=CF=82 (indexed as = =CF=80=CE=B5=CE=BB=CE=B1=CF=84=CE=B7=CF=82) & =CF=80=CE=B5=CE=BB=CE=AC=CF= =84=CE=B7=CE=A3 (indexed as =CF=80=CE=B5=CE=BB=CE=B1=CF=84=CE=B7=CF=82).I get the expected search resul= ts if I perform the search for =CF=80=CE=B5=CE=BB=CE=AC=CF=84=CE=B7=CE=A3 (or) =CF=80=CE=B5=CE=BB=CE=AC=CF= =84=CE=B7=CF=82 (or) any combinations of upper case & lower case Greek characters. But if I search as =CF=80=CE=B5=CE=BB=CE=B1=CF=84=CE=B7=CF=83 I= won't get any search results. In Greek, =CF=83 & =CF=82 are the lower forms of =CE=A3 Sigma. And this cas= e is solved in ICUFoldingFilter. Is ICU Transliterator rule formed right? Kindly look at the below code TokenStream tok =3D new ICUTransformFilter(tok, Transliterator.getInstance("Greek; > Lower; NFD; [:Nonspacing Mark:] Remove; NFC;")); Kindly help me to resolve this. Regards, Chitra --94eb2c1c1742a3224f055c49eb51--