nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject RE: Why doesn't hostdb support byDomain mode?
Date Mon, 05 Mar 2018 10:21:31 GMT
Hi,

The reason is simple, we (company) needed this information based on hostname, so we made a
hostdb. I don't see any downside for supporting a domain mode. Adding support for it through
hostdb.url.mode seems like a good idea.

Regards,
Markus

-----Original message-----
> From:Yossi Tamari <yossi.tamari@pipl.com>
> Sent: Sunday 4th March 2018 12:01
> To: user@nutch.apache.org
> Subject: Why doesn't hostdb support byDomain mode?
> 
> Hi,
> 
>  
> 
> Is there a reason that hostdb provides per-host data even when the
> generate/fetch are working by domain? This generates misleading statistics
> for servers that load-balance by redirecting to nodes (e.g. photobucket).
> 
> If this is just an oversight, I can contribute a patch, but I'm not sure if
> I should use partition.url.mode, generate.count.mode, one of the other
> similar properties, or create one more such property hostdb.url.mode.
> 
>  
> 
> Yossi.
> 
> 

Mime
View raw message