lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gucko Gucko <gucko.gu...@googlemail.com>
Subject Remove/Filter emails from a TokenStream?
Date Wed, 12 Jun 2013 18:39:20 GMT
Hello all,

is there a filter I can use to remove emails from a TokenStream?

so far I'm using this to remove numbers, URls, and I would like to remove
emails too:

Tokenizer tokenizer = new UAX29URLEmailTokenizer(Version.LUCENE_43,

  new StringReader(text));

  Set<String> stopTypes = new HashSet<String>();

 stopTypes.add("<URL>");

 stopTypes.add("<NUM>");

  TokenStream stream = new TypeTokenFilter(true, tokenizer, stopTypes);

 stream = new StandardFilter( Version.LUCENE_43, stream );

 stream = new LowerCaseFilter(Version.LUCENE_43, stream);


Thanks a million!


Best

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message