lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Terry Steichen" <>
Subject Tags Screwing up Searches
Date Fri, 18 Oct 2002 15:39:50 GMT
Some content I'm indexing contains certain HTML tags, like <p>, <b>, <i>,
etc.  What I find is that when a term I'm searching for touches one of these tags (which is
fairly typical), the term isn't recognized and the search fails.  For example, <b>College
Soccer</b> doesn't match on either "college" or "soccer".  I seem to recall someone
else bring up a similar problem with a word that ends a sentence (and is thus treated as if
the period was part of the word), but don't recall what the response was and I can't find
that thread.

Does anyone have some ideas on what's the best way to handle this?  Filter out the tags in
the process of creating the Document for indexing? Or through a modification to the Analyzer
(I'm using the StandardAnalyzer)? Or something else?



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message