Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id ACC04200D26 for ; Fri, 20 Oct 2017 10:34:47 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id AB329160BCB; Fri, 20 Oct 2017 08:34:47 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id EF4961609E1 for ; Fri, 20 Oct 2017 10:34:46 +0200 (CEST) Received: (qmail 27595 invoked by uid 500); 20 Oct 2017 08:34:40 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 27581 invoked by uid 99); 20 Oct 2017 08:34:40 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Oct 2017 08:34:40 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id A1A98C37C7 for ; Fri, 20 Oct 2017 08:34:39 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.629 X-Spam-Level: ** X-Spam-Status: No, score=2.629 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id AcrJYSTFdz6S for ; Fri, 20 Oct 2017 08:34:38 +0000 (UTC) Received: from mail-oi0-f52.google.com (mail-oi0-f52.google.com [209.85.218.52]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 797EA5F566 for ; Fri, 20 Oct 2017 08:34:38 +0000 (UTC) Received: by mail-oi0-f52.google.com with SMTP id q4so18946066oic.7 for ; Fri, 20 Oct 2017 01:34:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=XZTmCYMg+95wRZAFSjjc6U56S+miIG1sLGP2Vy9QDCk=; b=H14l4y6g/cbY6RwSNwf6LLTVMRmt6Hps/QSsxJvd2H4klUXLq2BDqGISN4H8Gg0caS ftMD4Cq6DjUqgydfLtMRCw9GLaVl9mopCWesneAHtOrCpWc5rMDcd0JQ4rtp8elLQmgY An9lfob9MS0oQdPLlCT5QfPwbrEceyal/xdraX6iRmmzX2INNM1dQnecQTJRz40Sp0Aa JdzpMfmTMP+yS7CFvldIUycYrFtYomQVukPJSpkZpiCf5B/3MKyWBOlO8IUpbWZiiYsR LGYIfNVUVnSN2Q69IPiZMEUtd3jN19RP6+XiCPTFd8ipJcVITJAbLohsYb2dytRGIlsy /7Ow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=XZTmCYMg+95wRZAFSjjc6U56S+miIG1sLGP2Vy9QDCk=; b=WWeGmETwGlgnbb7Lr4XiigTx97Rcaf1WyOg3BRoS5cQem4i11Eph1WOwYrjy5YD4l8 sDFZFMTGsUbPhzQgCs6+DbZdRewgDpVkHr7ram/ryCImu/3Rb+0zLP17ZLGEp3KgxYa+ 4gu0lmpNasJtdnWLdwXCveH8gOmo6jxSh3ymAuTUQBEmvRl5uam/22dt0uUMKrhqG/Jh 1rKY7fvWBOKMfiV0MB2G6nRPo3wAwG7FCn4IIE+XH5EMzuBLxPY137xcevNIRksIJYN3 +zglrJKcN18GafUw2IWjh0ETHUZC1Y8psH/R7ZWj90z6kDFUcfxA1XTaezpqxnofZSS8 epGw== X-Gm-Message-State: AMCzsaXdwPo2VA8aaW8rPGgFqJLlnQi2QDRHAMTUbJNlwAFOBRwf4CLv 2iOOvk28X9Xhpvn9WndKsxCLL5Jg5WC/1KW2np0= X-Google-Smtp-Source: ABhQp+TRT9c9ZiMFNHJGTzA8xZEDDrIdy8XnFYjptx0ikYG6lHnrfDIZFTuQrISqUgDb1nfu1dLYqkcTYscGqWpxxi4= X-Received: by 10.157.19.108 with SMTP id q41mr2596430otq.464.1508488477565; Fri, 20 Oct 2017 01:34:37 -0700 (PDT) MIME-Version: 1.0 Received: by 10.74.35.81 with HTTP; Fri, 20 Oct 2017 01:34:17 -0700 (PDT) In-Reply-To: References: From: Chitra Date: Fri, 20 Oct 2017 14:04:17 +0530 Message-ID: Subject: Re: ClassicAnalyzer Behavior on accent character To: Lucene Users Content-Type: multipart/alternative; boundary="001a1142ec8cda6807055bf6579f" archived-at: Fri, 20 Oct 2017 08:34:47 -0000 --001a1142ec8cda6807055bf6579f Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Robert, Yes, standardTokenizer solves my case... could you please explain the difference between ClassicalTokenizer and StandardTokenizer? How does standardTokenizer solve my case? I surf the web but I was unable to understand... Any help is greatly appreciated. On Fri, Oct 20, 2017 at 12:10 AM, Robert Muir wrote: > easy, don't use classictokenizer: use standardtokenizer instead. > > On Thu, Oct 19, 2017 at 9:37 AM, Chitra wrote: > > Hi, > > I indexed a term '=E2=92=B6e=C5=98=EA=9D=8B=EA=9D=92=C9= =AB=E2=B1=AF=C5=8B=C9=87' (aeroplane) and the term was > > indexed as "er l n", some characters were trimmed while indexing. > > > > Here is my code > > > > protected Analyzer.TokenStreamComponents createComponents(final String > >> fieldName, final Reader reader) > >> { > >> final ClassicTokenizer src =3D new ClassicTokenizer(getVersion= (), > >> reader); > >> src.setMaxTokenLength(ClassicAnalyzer.DEFAULT_MAX_ > TOKEN_LENGTH); > >> > >> TokenStream tok =3D new ClassicFilter(src); > >> tok =3D new LowerCaseFilter(getVersion(), tok); > >> tok =3D new StopFilter(getVersion(), tok, stopwords); > >> tok =3D new ASCIIFoldingFilter(tok); // to enable > AccentInsensitive > >> search > >> > >> return new Analyzer.TokenStreamComponents(src, tok) > >> { > >> @Override > >> protected void setReader(final Reader reader) throws > >> IOException > >> { > >> > >> src.setMaxTokenLength(ClassicAnalyzer.DEFAULT_MAX_TOKEN_LENGTH); > >> super.setReader(reader); > >> } > >> }; > >> } > > > > > > > > Am I missing anything? Is that expected behavior for my input or any > reason > > behind such abnormal behavior? > > > > -- > > Regards, > > Chitra > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --=20 Regards, Chitra --001a1142ec8cda6807055bf6579f--