lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Miller" <markrmil...@gmail.com>
Subject Standard Analyzer - Host and Acronym
Date Fri, 20 Apr 2007 20:40:01 GMT
StandardAnalyzer matches 'www.google.com' as a HOST and leaves the whole
token intact. However, if at the end of a sentence, StandardAnalyzer matches
'www.google.com.' as an ACRONYM which creates a token of 'wwwgooglecom'. A
search for 'www.google.com' will of course not match now.

Is this a known compromise? It seems kind of scary that you will lose the
ability to find a URL in a search if it comes at the end of a sentence.

Is only looking for ACRONYM's with a single letter between periods too
restrictive? Other ideas? Looking for HOST before ACRONYM is out because we
won't ever get to ACRONYM.

Or is this a known and accepted compramise?

- Mark

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message