lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: To Tokenize or Un_Tokenize?
Date Wed, 26 Jul 2006 20:50:06 GMT
You most certainly want to index the whole token, and likely portions of it (didn't you already
ask this a few weeks ago?).
You will want to write your own Analyzer + Tokenizer that's email-address-format-aware and
does things like:
emit the whole token
emit the username portion
email the fully qualified host name
email the fully qualified domain name

Perhaps you'll want to index some of these as separate fields, or perhaps you'll want to also
index user@domain even if an email address looks like user@foo.domain

Otis
 

----- Original Message ----
From: Michael J. Prichard <michael_prichard@mac.com>
To: java-user@lucene.apache.org
Sent: Wednesday, July 26, 2006 4:33:10 PM
Subject: To Tokenize or Un_Tokenize?

If I want to search an email address (i.e. michael@foo.com) do I need to 
Tokenize that field?

doc.add(new Field("from", (String) itemContent.get("from"), 
Field.Store.YES, Field.Index.TOKENIZED));

-OR-

doc.add(new Field("from", (String) itemContent.get("from"), 
Field.Store.YES, Field.Index.UN_TOKENIZED));

Thx.
Michael


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message