lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: StandardAnalyzer and Email Addresses
Date Mon, 20 Feb 2012 17:23:07 GMT
Are you using StandardAnalyzer in 3.1+?  You may want to use
ClassicAnalyzer instead.  I can't see where in the 3.5 javadocs it
says that email addresses are recognized, but it does sound vaguely
familiar.


--
Ian.


On Thu, Feb 16, 2012 at 5:18 PM, Charlie Hubbard
<charlie.hubbard@gmail.com> wrote:
> This is a pretty simple question to answer, but I have customers asking me
> how this is suppose to work and I'm having trouble explaining it.  I have
> an app that indexes emails so there are plenty of email addresses in there.
>  Reading the StandardAnalyzer javadoc it says it "recognizes" email
> addresses when it is creating the token list.  What tokens will it produce
> exactly?  What I'm seeing when I perform searches is the email address
> looks like its being tokenized into its parts.  Searching by an email
> address like:
>
> to:charlie.hubbard@gmail.com
>
> pulls back more hits that haven't been addressed to
> charlie.hubbard@gmail.com.  Other messages with gmail.com in them are
> returned.  If I use the following:
>
> to:charlie.hubbard
>
> in them.  It also finds gmail.com, and other domains.  And I can search for
> strings like
>
> to:"charlie.hubbard@gmail.com"
>
> it will pull back only emails addressed to that address.  Further proof it
> seems to token the parts of an email is if I search for a very specific
> email address like:
>
> to:"charlie.hubbard+sometag"
>
> That will pull back only emails addressed to that email, but it's not a
> full email address.  Which leads me to think it will parse parts of the
> email addresses.  Can someone explain this a little more?
>
> I'm having trouble with some emails that can't be pulled back using the
> username like searching for to:chubbard where the email was addressed to
> chubbard@somedomain.com, but it fails to show up in the search results.  I
> can't explain why that's happening.  In all of my tests I can't reproduce
> it and I think I might have to reindex everything because this was an index
> built with 2.4 and I upgraded to 3.1 so I'm worried it might be corrupted.
>
> Thoughts?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message