Return-Path: X-Original-To: apmail-lucene-general-archive@www.apache.org Delivered-To: apmail-lucene-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6080ED1D0 for ; Fri, 21 Dec 2012 07:47:17 +0000 (UTC) Received: (qmail 22967 invoked by uid 500); 21 Dec 2012 07:47:16 -0000 Delivered-To: apmail-lucene-general-archive@lucene.apache.org Received: (qmail 22772 invoked by uid 500); 21 Dec 2012 07:47:16 -0000 Mailing-List: contact general-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@lucene.apache.org Delivered-To: mailing list general@lucene.apache.org Received: (qmail 22652 invoked by uid 99); 21 Dec 2012 07:47:16 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Dec 2012 07:47:16 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of davidshen84@gmail.com designates 209.85.220.179 as permitted sender) Received: from [209.85.220.179] (HELO mail-vc0-f179.google.com) (209.85.220.179) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Dec 2012 07:47:09 +0000 Received: by mail-vc0-f179.google.com with SMTP id p1so4717806vcq.38 for ; Thu, 20 Dec 2012 23:46:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=81YsSGTrH3dtkXMDyftwxn/2wkwhm8Jsk+xNsUplNAc=; b=M4HqRIuNvIwwxjmKyV9N/sXTTsb397/vJ95YmaYg5fBc34Opmpdmxw960Q0aSnQRpJ dlISDJdzupwd2jjwB5X9Fq4lDlPWLmWcLFp0YijKW8qqoW0e4E1ZVMGlbyTxiipLdacH JGZReJIX874uhMYqBwTX9fZTvPCL8GCCj+52t8w/UQnpVH0DnHPKaS4f1pmw7u+vajFN HfajhbyA1obzuQ2dH7ptqW0PGk5ljvJm2m/OjMH8z3t84To/DjE5YiguuXtzO06nYT/D UGp1//971MvnEThDme+6vjB/7ZH+QtBaMgv3+dm45YKOpngyf+ojSzrl7yOAXzasDEqD 2iKQ== MIME-Version: 1.0 Received: by 10.52.17.244 with SMTP id r20mr16331508vdd.29.1356076008522; Thu, 20 Dec 2012 23:46:48 -0800 (PST) Received: by 10.58.66.198 with HTTP; Thu, 20 Dec 2012 23:46:48 -0800 (PST) In-Reply-To: References: Date: Fri, 21 Dec 2012 15:46:48 +0800 Message-ID: Subject: Re: Which token filter can combine 2 terms into 1? From: Xi Shen To: general@lucene.apache.org Content-Type: multipart/alternative; boundary=bcaec502d4c8c6c24404d1580c1c X-Virus-Checked: Checked by ClamAV on apache.org --bcaec502d4c8c6c24404d1580c1c Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Steve, This is a language dependent case. Basically, I will use white space token filter to process the input. But some of the inputs should be one term, instead of split into 2 terms. I think am thinking developing a special filter to fix these terms. On Fri, Dec 21, 2012 at 3:34 PM, Steve Rowe wrote: > Hi David, > > Not very many people read this mailing list - I suggest you switch to the > java-user list - see . > > SingleFilter and CommonGramsFilter combine terms, though the conditions > under which they do so don't appear to be the same as what you want. > > Why are only the second two terms combined? > > Steve > > On Dec 21, 2012, at 2:27 AM, Xi Shen wrote: > > > Hi, > > > > I am looking for a token filter that can combine 2 terms into 1? E.g. > > > > the input has been tokenized by white space: > > > > t1 t2 t2a t3 > > > > I want a filter that output: > > > > t1 t2t2a t3 > > > > I know it is a very special case, and I am thinking about develop a > filter > > of my own. But I cannot figure out which API I should use to look for > terms > > in a Token Stream. > > > > > > -- > > Regards=EF=BC=8C > > David Shen > > > > http://about.me/davidshen > > https://twitter.com/#!/davidshen84 > > --=20 Regards=EF=BC=8C David Shen http://about.me/davidshen https://twitter.com/#!/davidshen84 --bcaec502d4c8c6c24404d1580c1c--