hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Sync and Data Replication
Date Sun, 10 Jun 2012 16:39:26 GMT

On Sat, Jun 9, 2012 at 11:11 PM, Mohit Anchlia <mohitanchlia@gmail.com> wrote:
> Thanks Harsh for detailed info. It clears things up. Only thing from those
> page is concerning is what happens when client crashes. It says you could
> lose upto a block worth of information. Is this still true given that NN
> would auto close the file?

Where does it say this exactly? It is true that immediate readers will
not get the last block (as it remains open and uncommitted), but once
the lease recovery kicks in the file is closed successfully and the
last block is indeed made available, so there's no 'data loss'.

> Is it a good practice to reduce NN default value so that it auto-closes
> before 1 hr.

I've not seen people do this/need to do this. Most don't run into such
a situation and it is vital to properly close() files or sync() on
file streams before making it available to readers. HBase manages open
files during WAL-recovery using lightweight recoverLease APIs that
were added for its benefit, so it doesn't need to wait for an hour for
WALs to close and recover data.

Harsh J

View raw message