lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Standard Analyzer - Host and Acronym
Date Sat, 21 Apr 2007 06:55:43 GMT
: StandardAnalyzer matches '' as a HOST and leaves the whole
: token intact. However, if at the end of a sentence, StandardAnalyzer matches
: '' as an ACRONYM which creates a token of 'wwwgooglecom'. A
: search for '' will of course not match now.

: Or is this a known and accepted compramise?

StandardAnalyzer is black voodoo that i've never delved into ... but if
you are asking for opinions on how it *should* work i would think that
"" should not be considered an acronym for obvious reasons
-- if acronym is going to be a special token type where periods are
striped out, then i think assuming single letters is wise.

that said, i dont' think "" should be treated as a HOSTname
either ... because it's not.  DNS hostnames can't end in a "." ...
regardless of how grammarians might tell you to write a sentence, when you
put a period at the end, it stops being a hostname, and becomes a word
with funky puntuation in the middle.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message