lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Minh Kama Yie" <m...@nuix.com.au>
Subject Parsing email addresses with StandardTokenizer.
Date Mon, 28 Oct 2002 05:32:32 GMT
Hi all,

Please forgive me if this question has been asked elsewhere but I can't seem to find an answer
for this in the documentation. The code for StandardTokenizer is a little too deep to go into
right now :), so I thought I    'd post to the list first.

If I'm using the standard analyzer, which in turn uses StandardTokenizer, how would the following
email addresses be parsed?

- tom.jones@abc.com
- sheryl@abc.com

If I did a search for "abc.com", which entries should turn up? 
Right now I'm only getting tom.jones@abc.com, and if this is correct then what are the standard
tokenizing rules regarding the "@" sign, and where can I read up on this without looking at
the hexedecimal values in StandardTokenizer? 

I've basically been asked why the document for sheryl@abc.com doesn't turn up in the search
results for "abc.com".

Thanks in advance.

Regards,

Minh Kama Yie

This message is intended only for the named recipient. 
If you are not the intended recipient you are notified that
disclosing, copying, distributing or taking any action 
in reliance on the contents of this information is strictly 
prohibited.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message