hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thanh Do <than...@cs.wisc.edu>
Subject Re: DataBlockScanner scan period
Date Wed, 24 Nov 2010 01:41:47 GMT
sorry for digging up this old thread.

Brian, is this the reason you want to add a "data-level" scan
to HDFS, as in HDFS-221.

It seems to me that a very rarely read block could
be silently corrupted, because the DataBlockScanner
never finish it scanning job in 3 weeks...


On Wed, Oct 13, 2010 at 7:37 PM, Brian Bockelman <bbockelm@cse.unl.edu>wrote:

>
> On Oct 13, 2010, at 7:29 PM, Thanh Do wrote:
>
> > Hi Brian,
> >
> > If this is the case, then is there any chance that,
> > some how the DataBlockScanner cannot finishes
> > the verification for all the block in three weeks
> > (e.g, a node has a very large number of blocks)?
> >
>
> Yes.  At some point, I'd really like to figure out what percentage of our
> blocks actually get scanned at our site, I suspect some go very long without
> a scan.
>
> Brian
>
> > Thanh
> >
> > On Wed, Oct 13, 2010 at 7:18 PM, Brian Bockelman <bbockelm@cse.unl.edu
> >wrote:
> >
> >> Hi Thanh,
> >>
> >> That is correct.  Last time I read the code, Hadoop scheduled the block
> >> verifications randomly throughout the period in order to avoid periodic
> >> effects (i.e., high load every N minutes).
> >>
> >> Brian
> >>
> >> On Oct 13, 2010, at 7:14 PM, Thanh Do wrote:
> >>
> >>> Brian,
> >>>
> >>> When you say *attempt* to complete and *entire* node scan,
> >>> you mean for example, if a node has 100 block files, it will
> >>> try to verify all 100 block every 3 weeks?
> >>> That is in average, a block is scanned every (3 weeks / 100 time
> >> interval)?
> >>>
> >>> Thanks
> >>> Thanh
> >>>
> >>>
> >>> On Wed, Oct 13, 2010 at 7:07 PM, Brian Bockelman <bbockelm@cse.unl.edu
> >>> wrote:
> >>>
> >>>> Hi Thanh,
> >>>>
> >>>> The scan period is the period that hadoop *attempts* to complete an
> >> entire
> >>>> node scan.  That is, if it's set to 3 weeks, HDFS will try to scan
> each
> >>>> block once every 3 weeks.
> >>>>
> >>>> Obviously, depending on the bandwidth you have made available to the
> >>>> scanning thread, you can specify impossibly small periods.
> >>>>
> >>>> Brian
> >>>>
> >>>> On Oct 13, 2010, at 7:01 PM, Thanh Do wrote:
> >>>>
> >>>>> Hi again,
> >>>>>
> >>>>> Could any body explain to me about the scanning period
> >>>>> policy of DataBlockScanner? That is who often it wake up
> >>>>> and scan a block file.
> >>>>> When looking at the code, I found
> >>>>>
> >>>>> static final long DEFAULT_SCAN_PERIOD_HOURS = 21*24L; // three weeks
> >>>>>
> >>>>>
> >>>>> but definitely it does not wake up and pick a random block
> >>>>> to verify every three weeks, right?
> >>>>>
> >>>>> Thanks a lot,
> >>>>> Thanh
> >>>>
> >>>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message