hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Bockelman <bbock...@cse.unl.edu>
Subject Re: DataBlockScanner scan period
Date Thu, 14 Oct 2010 00:37:17 GMT

On Oct 13, 2010, at 7:29 PM, Thanh Do wrote:

> Hi Brian,
> 
> If this is the case, then is there any chance that,
> some how the DataBlockScanner cannot finishes
> the verification for all the block in three weeks
> (e.g, a node has a very large number of blocks)?
> 

Yes.  At some point, I'd really like to figure out what percentage of our blocks actually
get scanned at our site, I suspect some go very long without a scan.

Brian

> Thanh
> 
> On Wed, Oct 13, 2010 at 7:18 PM, Brian Bockelman <bbockelm@cse.unl.edu>wrote:
> 
>> Hi Thanh,
>> 
>> That is correct.  Last time I read the code, Hadoop scheduled the block
>> verifications randomly throughout the period in order to avoid periodic
>> effects (i.e., high load every N minutes).
>> 
>> Brian
>> 
>> On Oct 13, 2010, at 7:14 PM, Thanh Do wrote:
>> 
>>> Brian,
>>> 
>>> When you say *attempt* to complete and *entire* node scan,
>>> you mean for example, if a node has 100 block files, it will
>>> try to verify all 100 block every 3 weeks?
>>> That is in average, a block is scanned every (3 weeks / 100 time
>> interval)?
>>> 
>>> Thanks
>>> Thanh
>>> 
>>> 
>>> On Wed, Oct 13, 2010 at 7:07 PM, Brian Bockelman <bbockelm@cse.unl.edu
>>> wrote:
>>> 
>>>> Hi Thanh,
>>>> 
>>>> The scan period is the period that hadoop *attempts* to complete an
>> entire
>>>> node scan.  That is, if it's set to 3 weeks, HDFS will try to scan each
>>>> block once every 3 weeks.
>>>> 
>>>> Obviously, depending on the bandwidth you have made available to the
>>>> scanning thread, you can specify impossibly small periods.
>>>> 
>>>> Brian
>>>> 
>>>> On Oct 13, 2010, at 7:01 PM, Thanh Do wrote:
>>>> 
>>>>> Hi again,
>>>>> 
>>>>> Could any body explain to me about the scanning period
>>>>> policy of DataBlockScanner? That is who often it wake up
>>>>> and scan a block file.
>>>>> When looking at the code, I found
>>>>> 
>>>>> static final long DEFAULT_SCAN_PERIOD_HOURS = 21*24L; // three weeks
>>>>> 
>>>>> 
>>>>> but definitely it does not wake up and pick a random block
>>>>> to verify every three weeks, right?
>>>>> 
>>>>> Thanks a lot,
>>>>> Thanh
>>>> 
>>>> 
>> 
>> 


Mime
View raw message