Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 54DE910E12 for ; Wed, 12 Jun 2013 19:04:40 +0000 (UTC) Received: (qmail 21723 invoked by uid 500); 12 Jun 2013 19:04:38 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 21678 invoked by uid 500); 12 Jun 2013 19:04:38 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 21670 invoked by uid 99); 12 Jun 2013 19:04:38 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Jun 2013 19:04:38 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of gucko.gucko@googlemail.com designates 209.85.223.182 as permitted sender) Received: from [209.85.223.182] (HELO mail-ie0-f182.google.com) (209.85.223.182) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Jun 2013 19:04:32 +0000 Received: by mail-ie0-f182.google.com with SMTP id s9so9605020iec.27 for ; Wed, 12 Jun 2013 12:04:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=JcgE+AC1ks8q1HU4SmDcKvPZJmmH5TAitJ538uT9G6c=; b=eMQyvPqizCLKYmS2q2d32H7BfIVnYGiKI+XCN3RkCbGYK0rW/h/T2dXnWeV1KJDzJs JRMv9v1oOXoHPpdGO93OuLQyutRmgJ7BUjadLn80kCIBc0vcS8Gp4f2aEgrEc4BkhNQb DKmDGqo32MZEPZtgpUx1piIYZAeV6LCSqPcNJWWrJCmnHMmYZaoYITbBPYmxeJfK0lUa BLg9Fe3ymjAZdFl3l368HcbTAJgc8E1wTHOk2PPFrXwM6tuD1twclvQ+/1jhwrWBcrZO 0NY6/mmybtfwTlC1scEM0cFjCytmQV1j2CWRvoetxCEJLOnxHzXX9HZb+krAm3GHmYRI 1f2Q== MIME-Version: 1.0 X-Received: by 10.50.108.70 with SMTP id hi6mr4157986igb.21.1371063851684; Wed, 12 Jun 2013 12:04:11 -0700 (PDT) Received: by 10.50.66.227 with HTTP; Wed, 12 Jun 2013 12:04:11 -0700 (PDT) In-Reply-To: References: Date: Wed, 12 Jun 2013 21:04:11 +0200 Message-ID: Subject: Re: Remove/Filter emails from a TokenStream? From: Gucko Gucko To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=089e014953e0d8171504def9ad61 X-Virus-Checked: Checked by ClamAV on apache.org --089e014953e0d8171504def9ad61 Content-Type: text/plain; charset=ISO-8859-1 Hello, I figured out how to solve this. I just added stopTypes.add(""); On Wed, Jun 12, 2013 at 8:39 PM, Gucko Gucko wrote: > Hello all, > > is there a filter I can use to remove emails from a TokenStream? > > so far I'm using this to remove numbers, URls, and I would like to remove > emails too: > > Tokenizer tokenizer = new UAX29URLEmailTokenizer(Version.LUCENE_43, > > new StringReader(text)); > > Set stopTypes = new HashSet(); > > stopTypes.add(""); > > stopTypes.add(""); > > TokenStream stream = new TypeTokenFilter(true, tokenizer, stopTypes); > > stream = new StandardFilter( Version.LUCENE_43, stream ); > > stream = new LowerCaseFilter(Version.LUCENE_43, stream); > > > Thanks a million! > > > Best > --089e014953e0d8171504def9ad61--