lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <>
Subject Re: StandardAnalyzer and Email Addresses
Date Mon, 20 Feb 2012 17:23:07 GMT
Are you using StandardAnalyzer in 3.1+?  You may want to use
ClassicAnalyzer instead.  I can't see where in the 3.5 javadocs it
says that email addresses are recognized, but it does sound vaguely


On Thu, Feb 16, 2012 at 5:18 PM, Charlie Hubbard
<> wrote:
> This is a pretty simple question to answer, but I have customers asking me
> how this is suppose to work and I'm having trouble explaining it.  I have
> an app that indexes emails so there are plenty of email addresses in there.
>  Reading the StandardAnalyzer javadoc it says it "recognizes" email
> addresses when it is creating the token list.  What tokens will it produce
> exactly?  What I'm seeing when I perform searches is the email address
> looks like its being tokenized into its parts.  Searching by an email
> address like:
> pulls back more hits that haven't been addressed to
>  Other messages with in them are
> returned.  If I use the following:
> to:charlie.hubbard
> in them.  It also finds, and other domains.  And I can search for
> strings like
> to:""
> it will pull back only emails addressed to that address.  Further proof it
> seems to token the parts of an email is if I search for a very specific
> email address like:
> to:"charlie.hubbard+sometag"
> That will pull back only emails addressed to that email, but it's not a
> full email address.  Which leads me to think it will parse parts of the
> email addresses.  Can someone explain this a little more?
> I'm having trouble with some emails that can't be pulled back using the
> username like searching for to:chubbard where the email was addressed to
>, but it fails to show up in the search results.  I
> can't explain why that's happening.  In all of my tests I can't reproduce
> it and I think I might have to reindex everything because this was an index
> built with 2.4 and I upgraded to 3.1 so I'm worried it might be corrupted.
> Thoughts?

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message