hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Lock manager
Date Thu, 16 Feb 2006 21:46:44 GMT
Andrzej Bialecki wrote:
> Currently there is no mechanism to coordinate concurrent access to 
> specific resources across the cluster. This is apparent in Nutch in the 
> protocol plugins, which need this "global lock" mechanism to control 
> simultaneous access to remote hosts (at the moment the workaround is to 
> execute just one task simultaneously to handle the locking inside a 
> single JVM).

Central locking sounds like it could quickly become a bottleneck.

Currently what Nutch's fetcher does to observe politeness is to 
partition fetcher input by host, and to disable splitting of fetcher 
input files, so that all urls with a given host are always processed as 
a single task.  Since, for politeness, urls from a host must be 
processed serially, there's no advantage to doing it otherwise: if we 
permitted multiple nodes to synchronize their access to a host through a 
locking mechanism then fetching could go no faster.

So how is the current mechanism inadequate?  What are some use cases for 
centralized locking that cannot be solved by partitioning task input?


View raw message