hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghu Angadi <rang...@yahoo-inc.com>
Subject Re: Block reports: memory vs. file system, and Dividing offerService into 2 threads
Date Wed, 30 Apr 2008 20:40:06 GMT

Whether we need block reports or not w.r.t NameNode is a different issue 
from whether block report generation at DataNode should scan the local 
FS. I think this thread is concerned only with the latter.


Doug Cutting wrote:
> dhruba Borthakur wrote:
>> My current thinking is that "block report processing" should compare the
>> blkxxx files on disk with the data structure in the Datanode memory. If
>> and only if there is some discrepancy between these two, then a block
>> report be sent to the Namenode. If we do this, then we will practically
>> get rid of 99% of block reports.
> Doesn't this assume that the namenode and datanode are 100% in sync? 
> Another purpose of block reports is to make sure that the namenode and 
> datanode agree, since failed RPCs, etc. might have permitted them to 
> slip out of sync.  Or are we now confident that these are never out of 
> sync?  Perhaps we should start logging whenever a block report surprises?
> Long ago we talked of implementing partial, incremental block reports. 
> We'd divide blockid space into 64 sections.  The datanode would ask the 
> namenode for the hash of its block ids in a section.  Full block lists 
> would then only be sent when the hash differs.  Both sides would 
> maintain hashes of all sections in memory.  Then, instead of making a 
> block report every hour, we'd make a 1/64 block id check every minute.
> Doug

View raw message