lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <>
Subject Re: Tags Screwing up Searches
Date Sat, 19 Oct 2002 01:32:23 GMT
Is it possible that the Analyzer is stripping <, >, and / characters
and leaving you with terms like: bCollege and Soccerb ?


--- Terry Steichen <> wrote:
> Some content I'm indexing contains certain HTML tags, like <p>, <b>,
> <i>, etc.  What I find is that when a term I'm searching for touches
> one of these tags (which is fairly typical), the term isn't
> recognized and the search fails.  For example, <b>College Soccer</b>
> doesn't match on either "college" or "soccer".  I seem to recall
> someone else bring up a similar problem with a word that ends a
> sentence (and is thus treated as if the period was part of the word),
> but don't recall what the response was and I can't find that thread.
> Does anyone have some ideas on what's the best way to handle this? 
> Filter out the tags in the process of creating the Document for
> indexing? Or through a modification to the Analyzer (I'm using the
> StandardAnalyzer)? Or something else?
> TIA,
> Terry

Do you Yahoo!?
Y! Web Hosting - Let the expert host your web site

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message