lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: reoot site query results
Date Mon, 06 Dec 2004 10:12:27 GMT

On Dec 6, 2004, at 4:53 AM, Chris Fraschetti wrote:
> My lucene implementation works great, its basically an index of many
> web crawls. The main thing my users complain about is say a search for
> "slashdot" will return the
> http://www.slashdot.org/soem_dir/somepage.asp as the top result
> because the factors i have scoring it determine it as so... but
> obviously in true search engine fashion.. i would like
> http://www.slashdot.org/ to be the very top result... i've added a
> boost to queries that match the hostname field, which helped a little,
> but obviously not a proper solution. Does anyone out there in the
> search engine world have a good schema for determining root websites
> and applying a huge boost to them in one fashion or another? mainly so
> it appears before any sub pages? (assuming the query is in reference
> to that site) ...

Consider applying the boost to the Document, rather than the field, at 
index time.  I assume each document in your index represents one page.  
At indexing time you know whether it is a root page or not, right?

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message