lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Standard Analyzer - Host and Acronym
Date Sat, 21 Apr 2007 06:55:43 GMT
:
: StandardAnalyzer matches 'www.google.com' as a HOST and leaves the whole
: token intact. However, if at the end of a sentence, StandardAnalyzer matches
: 'www.google.com.' as an ACRONYM which creates a token of 'wwwgooglecom'. A
: search for 'www.google.com' will of course not match now.

: Or is this a known and accepted compramise?

StandardAnalyzer is black voodoo that i've never delved into ... but if
you are asking for opinions on how it *should* work i would think that
"www.google.com." should not be considered an acronym for obvious reasons
-- if acronym is going to be a special token type where periods are
striped out, then i think assuming single letters is wise.

that said, i dont' think "www.google.com." should be treated as a HOSTname
either ... because it's not.  DNS hostnames can't end in a "." ...
regardless of how grammarians might tell you to write a sentence, when you
put a period at the end, it stops being a hostname, and becomes a word
with funky puntuation in the middle.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message