lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Terry Steichen" <te...@net-frame.com>
Subject Re: Tags Screwing up Searches
Date Sun, 20 Oct 2002 16:22:41 GMT
Otis,

Doing some more testing, it turns out that it is the *trailing* tag that
screws things up.  Assume the text contains the phrase '<b>college
soccer</b>'.  This will match on 'college' or on 'soccer*', but not on
'soccer' or 'college soccer'.

I need to fix this quite soon.  In the absence of any better suggestions,
I'm just going to have to go in and either insert spaces or delete brackets
(ugh!).

Regards,

Terry

PS: Am I the only one that's having this problem?  If so, I must have
screwed up something.  If not, it could be a potentially serious bug.

----- Original Message -----
From: "Terry Steichen" <terry@net-frame.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Friday, October 18, 2002 11:56 PM
Subject: Re: Tags Screwing up Searches


> I tested against the phrase in my text, '<b>men's college soccer</b>',
> matching successfully on 'college AND soccer*'.  However, I found no match
> for 'college AND soccer', 'college AND soccer<*', 'college AND soccer<',
> 'college AND soccerb', 'college AND soccerb*', or 'college AND soccer/'.
>
> Regards,
>
> Terry
>
> ---- Original Message -----
> From: "Otis Gospodnetic" <otis_gospodnetic@yahoo.com>
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Sent: Friday, October 18, 2002 9:32 PM
> Subject: Re: Tags Screwing up Searches
>
>
> > Is it possible that the Analyzer is stripping <, >, and / characters
> > and leaving you with terms like: bCollege and Soccerb ?
> >
> > Otis
> >
> > --- Terry Steichen <terry@net-frame.com> wrote:
> > > Some content I'm indexing contains certain HTML tags, like <p>, <b>,
> > > <i>, etc.  What I find is that when a term I'm searching for touches
> > > one of these tags (which is fairly typical), the term isn't
> > > recognized and the search fails.  For example, <b>College Soccer</b>
> > > doesn't match on either "college" or "soccer".  I seem to recall
> > > someone else bring up a similar problem with a word that ends a
> > > sentence (and is thus treated as if the period was part of the word),
> > > but don't recall what the response was and I can't find that thread.
> > >
> > > Does anyone have some ideas on what's the best way to handle this?
> > > Filter out the tags in the process of creating the Document for
> > > indexing? Or through a modification to the Analyzer (I'm using the
> > > StandardAnalyzer)? Or something else?
> > >
> > > TIA,
> > >
> > > Terry
> > >
> > >
> >
> >
> > __________________________________________________
> > Do you Yahoo!?
> > Y! Web Hosting - Let the expert host your web site
> > http://webhosting.yahoo.com/
> >
> > --
> > To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
> >
> >
>
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message