lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From karl wettin <ka...@snigel.net>
Subject Re: indexing emails
Date Fri, 16 Jun 2006 20:13:28 GMT
On Fri, 2006-06-16 at 15:20 -0400, Michael J. Prichard wrote:
> I am working on indexing emails and want to have a "to" field.  I am 
> currently putting all the emails on one line seperated w/ spaces...example:
> 
> michael@foo.bar john@boo.com jane@bar.com
> 
> Then i index that with a StandardAnalyzer as follows:
> 
> doc.add(new Field("to", (String) itemContent.get("to"), Field.Store.YES, 
> Field.Index.UN_TOKENIZED));
> 
> Question is...is this the best way to do it?  I want to be able to 
> search for michael@foo.bar and pick out just those Documents, etc.

You can either do it as above (but you want to TOKENIZE the field) or
you could create a new UN_TOKENIZED field for each email address.

The second will require less CPU as it does not involve any lexical
analysis. It will also create larger distance between the addresses in
the index (see span queries and term positions).


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message