lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gucko Gucko <gucko.gu...@googlemail.com>
Subject Re: Remove/Filter emails from a TokenStream?
Date Wed, 12 Jun 2013 19:04:11 GMT
Hello,

I figured out how to solve this. I just added stopTypes.add("<EMAIL>");




On Wed, Jun 12, 2013 at 8:39 PM, Gucko Gucko <gucko.gucko@googlemail.com>wrote:

> Hello all,
>
> is there a filter I can use to remove emails from a TokenStream?
>
> so far I'm using this to remove numbers, URls, and I would like to remove
> emails too:
>
> Tokenizer tokenizer = new UAX29URLEmailTokenizer(Version.LUCENE_43,
>
>   new StringReader(text));
>
>   Set<String> stopTypes = new HashSet<String>();
>
>  stopTypes.add("<URL>");
>
>  stopTypes.add("<NUM>");
>
>   TokenStream stream = new TypeTokenFilter(true, tokenizer, stopTypes);
>
>  stream = new StandardFilter( Version.LUCENE_43, stream );
>
>  stream = new LowerCaseFilter(Version.LUCENE_43, stream);
>
>
> Thanks a million!
>
>
> Best
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message