Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4C16AE149 for ; Fri, 21 Dec 2012 08:43:09 +0000 (UTC) Received: (qmail 56700 invoked by uid 500); 21 Dec 2012 08:43:07 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 56659 invoked by uid 500); 21 Dec 2012 08:43:07 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 56632 invoked by uid 99); 21 Dec 2012 08:43:06 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Dec 2012 08:43:06 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of davidshen84@gmail.com designates 209.85.220.178 as permitted sender) Received: from [209.85.220.178] (HELO mail-vc0-f178.google.com) (209.85.220.178) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Dec 2012 08:42:59 +0000 Received: by mail-vc0-f178.google.com with SMTP id x16so4844020vcq.9 for ; Fri, 21 Dec 2012 00:42:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=906xnco1dNMhV93azFBpjGgrKdm8Zj3RnD7KzIlwrSw=; b=OnC+BFKEXvwn0t0+lkvRrsZcsbb/bhcwyGfoH8hPhZWSL6lxw98S5djvcdhZUdDexu CfhKXxUB+CbwuDhGYahAoDheMyT+0vQ2QZZ/QWbs2HDly/Rl/sJmR/tYijypXQq+Jm5h aq1ynXeVnwlwoUioLsMqCYL1Ku4oamTOHUpiQSLzU8vp+DHaxv0/01IAAy8D9//+d4P7 t8dIoM0HXZe/TYPeEC7MXzH65j7vrLBXKcI48tpNO7I80aGcA4rkPLz+VnfwEuc0OMtY yjoNy/ZDXxm5OYfQ9X9L9JNXKeMrRcOZ65QoNY8RioWKSdd6iqYOFr/0wUe2h4gqcH+D FDUQ== MIME-Version: 1.0 Received: by 10.220.150.136 with SMTP id y8mr18503875vcv.34.1356079358784; Fri, 21 Dec 2012 00:42:38 -0800 (PST) Received: by 10.58.66.198 with HTTP; Fri, 21 Dec 2012 00:42:38 -0800 (PST) In-Reply-To: References: Date: Fri, 21 Dec 2012 16:42:38 +0800 Message-ID: Subject: Re: Which token filter can combine 2 terms into 1? From: Xi Shen To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=f46d043d64b377b40504d158d401 X-Virus-Checked: Checked by ClamAV on apache.org --f46d043d64b377b40504d158d401 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I have to use the white space and word delimiter to process the input first. I tried many combination, and it seems to me that it is inevitable the term will be split into two :( I think developing my own filter is the only resolution...but I just cannot find a guide to help me understand what I need to do to implement a TokenFilter. On Fri, Dec 21, 2012 at 4:03 PM, Danil =C5=A2ORIN wrot= e: > Easiest way would be to pre-process your input and join those 2 tokens > before splitting them by white space. > > But from given context I might miss some details...still worth a shot. > > On Fri, Dec 21, 2012 at 9:50 AM, Xi Shen wrote: > > > Hi, > > > > I am looking for a token filter that can combine 2 terms into 1? E.g. > > > > the input has been tokenized by white space: > > > > t1 t2 t2a t3 > > > > I want a filter that output: > > > > t1 t2t2a t3 > > > > I know it is a very special case, and I am thinking about develop a > filter > > of my own. But I cannot figure out which API I should use to look for > terms > > in a Token Stream. > > > > -- > > Regards=EF=BC=8C > > David Shen > > > > http://about.me/davidshen > > https://twitter.com/#!/davidshen84 > > > --=20 Regards=EF=BC=8C David Shen http://about.me/davidshen https://twitter.com/#!/davidshen84 --f46d043d64b377b40504d158d401--