lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charlie Hubbard <>
Subject StandardAnalyzer and Email Addresses
Date Thu, 16 Feb 2012 17:18:43 GMT
This is a pretty simple question to answer, but I have customers asking me
how this is suppose to work and I'm having trouble explaining it.  I have
an app that indexes emails so there are plenty of email addresses in there.
 Reading the StandardAnalyzer javadoc it says it "recognizes" email
addresses when it is creating the token list.  What tokens will it produce
exactly?  What I'm seeing when I perform searches is the email address
looks like its being tokenized into its parts.  Searching by an email
address like:

pulls back more hits that haven't been addressed to  Other messages with in them are
returned.  If I use the following:


in them.  It also finds, and other domains.  And I can search for
strings like


it will pull back only emails addressed to that address.  Further proof it
seems to token the parts of an email is if I search for a very specific
email address like:


That will pull back only emails addressed to that email, but it's not a
full email address.  Which leads me to think it will parse parts of the
email addresses.  Can someone explain this a little more?

I'm having trouble with some emails that can't be pulled back using the
username like searching for to:chubbard where the email was addressed to, but it fails to show up in the search results.  I
can't explain why that's happening.  In all of my tests I can't reproduce
it and I think I might have to reindex everything because this was an index
built with 2.4 and I upgraded to 3.1 so I'm worried it might be corrupted.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message