lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@lucene.com>
Subject Re: setBoost Q.
Date Thu, 01 Aug 2002 21:20:27 GMT
Mike Tinnes wrote:
> I've been working on tying in a PageRank algo to
> my web crawler using lucene and have a few problems. If I don't know the
> boost factor until AFTER the crawl is it possible to still set the boost?

Why not: (1) crawl, saving pages to disk; (2) analyze links and compute 
boosts; then, finally, (3) build the Lucene index?

The API does not currently let you change a field's boost after a 
document is indexed.  It is in theory possible, but would require 
overwriting .fXX files, which further complicates inter-process 
synchronization of index access.  Perhaps this can be added as a caveat 
emptor API, but, in the meantime, I suggest the above approach.

> Also what does setBoost() actually do to the rank?

The rank is the position of a document in a hit list: the first hit has 
rank one, and so on.  Hits are sorted by score.  The boost is multiplied 
into score of hits.  So a boost which is greater than 1.0 will tend to 
increase the rank of hits on that field, while a boost which is less 
than 1.0 will tend to decrease the rank of hits on that field.

Doug


--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message