lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Tags Screwing up Searches
Date Mon, 21 Oct 2002 22:55:52 GMT
Si, with StandardAnalyzer, I believe, since neither < nor > are
alphabetical characters.

Otis

--- Terry Steichen <terry@net-frame.com> wrote:
> How should this be done (the translation, that is)?  If it were left
> as '<'
> and '>', would Lucene parse it properly?
> 
> Terry
> 
> ----- Original Message -----
> From: "Otis Gospodnetic" <otis_gospodnetic@yahoo.com>
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Sent: Monday, October 21, 2002 5:40 PM
> Subject: Re: Tags Screwing up Searches
> 
> 
> > Thanks for the update.
> > This all sounds right (no bugs).  The problem is the code that you
> have
> > that translates those < and > characters.
> >
> > Otis
> >
> > --- Terry Steichen <terry@net-frame.com> wrote:
> > > Otis,
> > >
> > > I discovered that the actual text that I was dealing with already
> > > converted
> > > the '<' converted to '&lt;', and so forth.  So the problem is
> that
> > > with
> > > something like '&lt;b&gt;College Soccer&lt;/b&gt;', Lucene
> recognizes
> > > the
> > > trailing semi-colon ';' as a word separator, so it can find the
> term
> > > 'college', but it does not see the ending of 'soccer'.  I did
> confirm
> > > that
> > > it *will* match on 'soccer&lt;' just fine.
> > >
> > > I've proceeded to add a string substitution method which replaces
> > > '&lt;'
> > > with '    ' (four spaces, in order to hopefully keep the offsets
> > > straight).
> > > It appears to work, though I believe it slows down the indexing.
> > >
> > > I don't know enough about the inner design of Lucene to figure
> this
> > > out, but
> > > it seems logical that there would be a much more efficient way to
> > > handle
> > > this than string operations.
> > >
> > > Anyway, thought I'd bring you up to date.
> > >
> > > Regards,
> > >
> > > Terry
> > >
> > > PS: I've had no responses from the list, so perhaps this is a
> unique
> > > problem
> > > and doesn't justify a formal fix effort.
> > >
> > > ----- Original Message -----
> > > From: "Terry Steichen" <terry@net-frame.com>
> > > To: "Lucene Users Group" <lucene-user@jakarta.apache.org>
> > > Sent: Friday, October 18, 2002 11:39 AM
> > > Subject: Tags Screwing up Searches
> > >
> > >
> > > Some content I'm indexing contains certain HTML tags, like <p>,
> <b>,
> > > <i>,
> > > etc.  What I find is that when a term I'm searching for touches
> one
> > > of these
> > > tags (which is fairly typical), the term isn't recognized and the
> > > search
> > > fails.  For example, <b>College Soccer</b> doesn't match on
> either
> > > "college"
> > > or "soccer".  I seem to recall someone else bring up a similar
> > > problem with
> > > a word that ends a sentence (and is thus treated as if the period
> was
> > > part
> > > of the word), but don't recall what the response was and I can't
> find
> > > that
> > > thread.
> > >
> > > Does anyone have some ideas on what's the best way to handle
> this?
> > > Filter
> > > out the tags in the process of creating the Document for
> indexing? Or
> > > through a modification to the Analyzer (I'm using the
> > > StandardAnalyzer)? Or
> > > something else?
> > >
> > > TIA,
> > >
> > > Terry
> > >
> > >
> > >
> > >
> > > --
> > > To unsubscribe, e-mail:
> > > <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > > For additional commands, e-mail:
> > > <mailto:lucene-user-help@jakarta.apache.org>
> > >
> >
> >
> > __________________________________________________
> > Do you Yahoo!?
> > Y! Web Hosting - Let the expert host your web site
> > http://webhosting.yahoo.com/
> >
> > --
> > To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
> >
> >
> 
> 
> --
> To unsubscribe, e-mail:  
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
> 
>

__________________________________________________
Do you Yahoo!?
Y! Web Hosting - Let the expert host your web site
http://webhosting.yahoo.com/

--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message