Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 3C665200BE4 for ; Wed, 21 Dec 2016 13:24:03 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 3AF73160B26; Wed, 21 Dec 2016 12:24:03 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 85B60160B0C for ; Wed, 21 Dec 2016 13:24:02 +0100 (CET) Received: (qmail 68067 invoked by uid 500); 21 Dec 2016 12:24:01 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 68054 invoked by uid 99); 21 Dec 2016 12:24:01 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Dec 2016 12:24:01 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id A46F11A9F5C for ; Wed, 21 Dec 2016 12:24:00 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.879 X-Spam-Level: ** X-Spam-Status: No, score=2.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_REPLY=1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id vnPmNbu8q8Fw for ; Wed, 21 Dec 2016 12:23:59 +0000 (UTC) Received: from mail-vk0-f52.google.com (mail-vk0-f52.google.com [209.85.213.52]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 796475F342 for ; Wed, 21 Dec 2016 12:23:59 +0000 (UTC) Received: by mail-vk0-f52.google.com with SMTP id 137so149069962vkl.0 for ; Wed, 21 Dec 2016 04:23:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=tfgGawrUfswFcdX9UgQxlEWrMnl5663BwzLvCyxyiKM=; b=RTM83jJzxb7cSYLsIHUMJYZdsGyqjWXzk00qVfE/QsGkYSbvmlEAGi4omELBBJ++sD ZQlWEZlu5RW8tnyEAEB7mfJQ8/oF7IqbW+ZR8bBYa1e4zTgX4wBpFiyNu9j0Zb+w4KQ1 4hUUqBiTB/zZX/aVKTj2O6OHLlHT+WpfmvHV68Y6Y6+rk4rT1nheaMAndXBFSZ2iNvfB qNvEVBBqB41pMq19b2ENfJYUzaM1GTlnQijXeaCldWZq9XdA2EX59Mo6f6Uz/PdT239R mAk18YGbbCU2zWETL0nYEj8xUCp4WLGOcrjx4sJe6BmoMOD06Po6Uekse+bewSwQXpxU Kwsw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=tfgGawrUfswFcdX9UgQxlEWrMnl5663BwzLvCyxyiKM=; b=AHpZtWygfYGBkRM0Vcz6IPIJ7+o0OKmMSghxyQuX79J1ZHcYUYkDOvb2voDwOn/HKz fEA+upY6fc9AQhAA9HOfY5y81xN/lWm39kc9vTxiwmOTyMo5+IQvUB4x99r4SoxsgTiw cGGZOpalC19TXWKIPsFSQOvss/evZCwPeBN2DqnaHlqDazVkT6YJ18Y4TND1txzzZEIs xXnnwTLScikMBfJfN/R2FZRur3v/F+u6FOewO9wrUS23wocnxVcy3oM3kAnzu7aLVn6Q u4BOKfB81/5XpGh/BnU07lH0/ZktFf9sUxA+1tHvinErhtLk7pTuTUghfVzmtvxDmV8Q uPtQ== X-Gm-Message-State: AIkVDXJRlDGBqsbxkRyZHg8Jfsaoggx58a2UZTywkXeUL0X5Qs8YiWaXCrE6ezBKWTtnzWMfEG2QlTI2cRSu4w== X-Received: by 10.31.41.150 with SMTP id p144mr1711967vkp.68.1482323039024; Wed, 21 Dec 2016 04:23:59 -0800 (PST) MIME-Version: 1.0 Received: by 10.159.32.162 with HTTP; Wed, 21 Dec 2016 04:23:58 -0800 (PST) In-Reply-To: References: From: suriya prakash Date: Wed, 21 Dec 2016 17:53:58 +0530 Message-ID: Subject: Re: Email id tokenizer (actual email id & multiple terms) To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=001a113ef4e62efb5305442a3afc archived-at: Wed, 21 Dec 2016 12:24:03 -0000 --001a113ef4e62efb5305442a3afc Content-Type: text/plain; charset=UTF-8 Hi, Thanks for your reply. I might have one or more emailds in a single record. So I have to index it with white space analyser after filtering emailid alone(may be using email id tokenizer). Tokenization will happen twice( for normal indexing and for special emailid field indexing) which is costly for content field. Is there any way to do it efficiently? will TeeSinkTokenFilter help for my case? On Tue, Dec 20, 2016 at 7:45 PM, suriya prakash wrote: > Hi, > > I am using standard analyzer and want to split token for email_id " > lucene@gmail.com" as "lucene", "gmail","com","lucene@gmail.com" in a > single pass. > > I have already changed jflex to split email id as separate words(lucene, > gmail, com). But we need to do phrase search which will not be efficient. > So i want to index actual email id and splitted words. > > Can you please help me to achieve this. OR let me know whether phrase > search is efficient for this case? > > > Regards, > Suriya > --001a113ef4e62efb5305442a3afc--