lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kiwi clive <kiwi_cl...@yahoo.com>
Subject StandardAnalyzer functionality change
Date Wed, 24 Oct 2012 10:42:11 GMT
Hi all,

Sorry if I'm asking an age old question but we have migrated to lucene 3.6.0 and I see StandardAnalyzer
has changed its behaviour, particularly when tokenizing email addresses. From reading the
forums, I understand StandardAnalyzer was renamed to ClassicAnalyzer - is this the case ?


If I pass the string 'user@domain.com' through these analyzers, I get the following tokens:

Using StandardAnalyzer(Version.LUCENE_23):  -->  user@domain.com (one token)

Using StandardAnalyzer(Version.LUCENE_36):  -->  user domain.com    (two tokens)
Using ClassicAnalyzer(Version.LUCENE_36):     -->  user@domain.com  (one token)

StandardAnalyzer is normally a good compromise as a default analyzer but the failure to keep
an email address intact makes it less fit for purpose than it used to be. Is this a bug or
is it by design ?  If by design, what is the reason for the change and is ClassicAnalyzer
now the defacto-analyzer to use ?

Thanks,
Clive

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message