lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brian Rook" <brian.r...@xor.com>
Subject excluding files / refining search
Date Thu, 14 Feb 2002 17:49:27 GMT
Hello,

I've been working with lucene for about a month now.  I've got my indexes
created, but I'm having a problem with the results I've been returning.

The site I'm working on has a lot of small html files that are used for page
construction (nav bars, footers, etc) and they're being returned high in the
results because they contain the search term(s) I'm looking for and are
small so they rank higher than larger documents.

I want to exclude them from the index and I've come up with two ideas:

1) move them to a directory, which I will exclude from the index, but I'll
have to change a bunch of links

2) detect them with some sort of flag and exclude them from the index.  We
were thinking that we could have a fake tag that lucene would detect and not
index those pages.

Has anyone run into this problem before?  How difficult would it be to
implement 2?  Is there a way to detect a fake tag?  I'm assuming that I can
create a new boolean in the HTMLDocument class (true if it contains the
exclude tag) and then not run Indexwriter.addDocument() if I find it.

b


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message