nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Weiss <>
Subject Re: Search quality evaluation
Date Wed, 05 Apr 2006 19:01:04 GMT

> In any case, it includes a system to scrape search results from other 
> engines, based on Apple's Sherlock search-engine descriptors.  These 
> descriptors are also used by Mozilla:

Just a note: we used to have exactly the same mechanism in Carrot2. 
Unfortunately this format does not make a clear distinction between 
title/ url/ snippet parts and stays at snippet granularity, so we 
additionally parsed each snippet with regular expressions...  The 
problem that lies beneath is in terms-of-use which forbid automatic 
scraping of search results using these plugins... That's the main reason 
why we switched to public APIs, actually.


View raw message