I've always wondered if it would be useful to try to fit the PageRank
(heuristic?) into Lucene.
As an experiment I ran PageRank on 2 trees of Javadoc (the Lucene
javadoc and the JDK1.4 javadoc) and product a report that shows the
PageRank value for every page.
The Lucene javadoc report is here:
http://www.searchmorph.com/static/lucene-report.html
The weblog entry has a bit more details and links to the much larger
jdk1.4 report:
http://searchmorph.com/weblog/index.php?id=29
And my feeling is that in the context of machine-generated pages, Page
Rank doesn't help that much.
Also, it's not clear how to use it e.g. make it the Document boost or
put it into a separate field for use by a custom scoring function? I
think the Google scoring function is a secret.
And...I'm pretty sure it can't easily be used w/ incremental index
additions as it wants an entire link graph.
Hope this isn't too far off topic, sorry if so, but thought it was
relevant enough to mention...
- Dave
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
|