hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Azuryy Yu <azury...@gmail.com>
Subject Re: data loss after cluster wide power loss
Date Mon, 01 Jul 2013 23:52:48 GMT
how to enable "sync on block close" in HDFS?


--Send from my Sony mobile.
On Jul 2, 2013 6:47 AM, "Lars Hofhansl" <lhofhansl@yahoo.com> wrote:

> HBase is interesting here, because it rewrites old data into new files. So
> a power outage by default would not just lose new data but potentially old
> data as well.
> You can enable "sync on block close" in HDFS, and then at least be sure
> that closed blocks (and thus files) are synced to disk physically.
> I found that if that is paired with the "sync behind write" fadvice hint
> there performance impact is minimal.
>
> -- Lars
>
> Dave Latham <latham@davelink.net> wrote:
>
> >Thanks for the response, Suresh.
> >
> >I'm not sure that I understand the details properly.  From my reading of
> >HDFS-744 the hsync API would allow a client to make sure that at any point
> >in time it's writes so far hit the disk.  For example, for HBase it could
> >apply a fsync after adding some edits to its WAL to ensure those edits are
> >fully durable for a file which is still open.
> >
> >However, in this case the dfs file was closed and even renamed.  Is it the
> >case that even after a dfs file is closed and renamed that the data blocks
> >would still not be synced and would still be stored by the datanode in
> >"blocksBeingWritten" rather than in "current"?  If that is case, would it
> >be better for the NameNode not to reject replicas that are in
> >blocksBeingWritten, especially if it doesn't have any other replicas
> >available?
> >
> >Dave
> >
> >
> >On Mon, Jul 1, 2013 at 3:16 PM, Suresh Srinivas <suresh@hortonworks.com
> >wrote:
> >
> >> Yes this is a known issue.
> >>
> >> The HDFS part of this was addressed in
> >> https://issues.apache.org/jira/browse/HDFS-744 for 2.0.2-alpha and is
> not
> >> available in 1.x  release. I think HBase does not use this API yet.
> >>
> >>
> >> On Mon, Jul 1, 2013 at 3:00 PM, Dave Latham <latham@davelink.net>
> wrote:
> >>
> >> > We're running HBase over HDFS 1.0.2 on about 1000 nodes.  On Saturday
> the
> >> > data center we were in had a total power failure and the cluster went
> >> down
> >> > hard.  When we brought it back up, HDFS reported 4 files as CORRUPT.
>  We
> >> > recovered the data in question from our secondary datacenter, but I'm
> >> > trying to understand what happened and whether this is a bug in HDFS
> that
> >> > should be fixed.
> >> >
> >> > From what I can tell the file was created and closed by the dfs client
> >> > (hbase).  Then HBase renamed it into a new directory and deleted some
> >> other
> >> > files containing the same data.  Then the cluster lost power.  After
> the
> >> > cluster was restarted, the datanodes reported into the namenode but
> the
> >> > blocks for this file appeared as "blocks being written" - the namenode
> >> > rejected them and the datanodes deleted the blocks.  At this point
> there
> >> > were no replicas for the blocks and the files were marked CORRUPT.
>  The
> >> > underlying file systems are ext3.  Some questions that I would love
> get
> >> > answers for if anyone with deeper understanding of HDFS can chime in:
> >> >
> >> >  - Is this a known scenario where data loss is expected?  (I found
> >> > HDFS-1539 but that seems different)
> >> >  - When are blocks moved from blocksBeingWritten to current?  Does
> that
> >> > happen before a file close operation is acknowledged to a hdfs client?
> >> >  - Could it be that the DataNodes actually moved the blocks to current
> >> but
> >> > after the restart ext3 rewound state somehow (forgive my ignorance of
> >> > underlying file system behavior)?
> >> >  - Is there any other explanation for how this can happen?
> >> >
> >> > Here is a sequence of selected relevant log lines from the RS (HBase
> >> > Region Server) NN (NameNode) and DN (DataNode - 1 example of 3 in
> >> > question).  It includes everything that mentions the block in
> question in
> >> > the NameNode and one DataNode log.  Please let me know if this more
> >> > information that would be helpful.
> >> >
> >> > RS 2013-06-29 11:16:06,812 DEBUG org.apache.hadoop.hbase.util.FSUtils:
> >> > Creating
> >> >
> >>
> file=hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
> >> > with permission=rwxrwxrwx
> >> > NN 2013-06-29 11:16:06,830 INFO org.apache.hadoop.hdfs.StateChange:
> >> BLOCK*
> >> > NameSystem.allocateBlock:
> >> >
> >>
> /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c.
> >> > blk_1395839728632046111_357084589
> >> > DN 2013-06-29 11:16:06,832 INFO
> >> > org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> >> > blk_1395839728632046111_357084589 src: /10.0.5.237:14327 dest: /
> >> > 10.0.5.237:50010
> >> > NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange:
> >> BLOCK*
> >> > NameSystem.addStoredBlock: blockMap updated: 10.0.6.1:50010 is added
> to
> >> > blk_1395839728632046111_357084589 size 25418340
> >> > NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange:
> >> BLOCK*
> >> > NameSystem.addStoredBlock: blockMap updated: 10.0.6.24:50010 is
> added to
> >> > blk_1395839728632046111_357084589 size 25418340
> >> > NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange:
> >> BLOCK*
> >> > NameSystem.addStoredBlock: blockMap updated: 10.0.5.237:50010 is
> added
> >> to
> >> > blk_1395839728632046111_357084589 size 25418340
> >> > DN 2013-06-29 11:16:11,385 INFO
> >> > org.apache.hadoop.hdfs.server.datanode.DataNode: Received block
> >> > blk_1395839728632046111_357084589 of size 25418340 from /
> >> 10.0.5.237:14327
> >> > DN 2013-06-29 11:16:11,385 INFO
> >> > org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for
> >> > block blk_1395839728632046111_357084589 terminating
> >> > NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange:
> >> > Removing lease on  file
> >> >
> >>
> /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
> >> > from client DFSClient_hb_rs_hs745,60020,1372470111932
> >> > NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange:
> DIR*
> >> > NameSystem.completeFile: file
> >> >
> >>
> /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
> >> > is closed by DFSClient_hb_rs_hs745,60020,1372470111932
> >> > RS 2013-06-29 11:16:11,393 INFO
> >> > org.apache.hadoop.hbase.regionserver.Store: Renaming compacted file at
> >> >
> >>
> hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
> >> > to
> >> >
> >>
> hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/n/6e0cc30af6e64e56ba5a539fdf159c4c
> >> > RS 2013-06-29 11:16:11,505 INFO
> >> > org.apache.hadoop.hbase.regionserver.Store: Completed major compaction
> >> of 7
> >> > file(s) in n of
> >> > users-6,\x12\xBDp\xA3,1359426311784.b5b0820cde759ae68e333b2f4015bb7e.
> >> into
> >> > 6e0cc30af6e64e56ba5a539fdf159c4c, size=24.2m; total size for store is
> >> 24.2m
> >> >
> >> > -------  CRASH, RESTART ---------
> >> >
> >> > NN 2013-06-29 12:01:19,743 INFO org.apache.hadoop.hdfs.StateChange:
> >> BLOCK*
> >> > NameSystem.addStoredBlock: addStoredBlock request received for
> >> > blk_1395839728632046111_357084589 on 10.0.6.1:50010 size 21978112 but
> >> was
> >> > rejected: Reported as block being written but is a block of closed
> file.
> >> > NN 2013-06-29 12:01:19,743 INFO org.apache.hadoop.hdfs.StateChange:
> >> BLOCK*
> >> > NameSystem.addToInvalidates: blk_1395839728632046111 is added to
> >> invalidSet
> >> > of 10.0.6.1:50010
> >> > NN 2013-06-29 12:01:20,155 INFO org.apache.hadoop.hdfs.StateChange:
> >> BLOCK*
> >> > NameSystem.addStoredBlock: addStoredBlock request received for
> >> > blk_1395839728632046111_357084589 on 10.0.5.237:50010 size 16971264
> but
> >> > was rejected: Reported as block being written but is a block of closed
> >> file.
> >> > NN 2013-06-29 12:01:20,155 INFO org.apache.hadoop.hdfs.StateChange:
> >> BLOCK*
> >> > NameSystem.addToInvalidates: blk_1395839728632046111 is added to
> >> invalidSet
> >> > of 10.0.5.237:50010
> >> > NN 2013-06-29 12:01:20,175 INFO org.apache.hadoop.hdfs.StateChange:
> >> BLOCK*
> >> > NameSystem.addStoredBlock: addStoredBlock request received for
> >> > blk_1395839728632046111_357084589 on 10.0.6.24:50010 size 21913088
> but
> >> > was rejected: Reported as block being written but is a block of closed
> >> file.
> >> > NN 2013-06-29 12:01:20,175 INFO org.apache.hadoop.hdfs.StateChange:
> >> BLOCK*
> >> > NameSystem.addToInvalidates: blk_1395839728632046111 is added to
> >> invalidSet
> >> > of 10.0.6.24:50010
> >> > (note the clock on the server running DN is wrong after restart.  I
> >> > believe timestamps are off by 6 hours:)
> >> > DN 2013-06-29 06:07:22,877 INFO
> >> > org.apache.hadoop.hdfs.server.datanode.DataNode: Scheduling block
> >> > blk_1395839728632046111_357084589 file
> >> > /data/hadoop/dfs/data/blocksBeingWritten/blk_1395839728632046111 for
> >> > deletion
> >> > DN 2013-06-29 06:07:24,952 INFO
> >> > org.apache.hadoop.hdfs.server.datanode.DataNode: Deleted block
> >> > blk_1395839728632046111_357084589 at file
> >> > /data/hadoop/dfs/data/blocksBeingWritten/blk_1395839728632046111
> >> >
> >> >
> >> > Thanks,
> >> > Dave
> >> >
> >>
> >>
> >>
> >> --
> >> http://hortonworks.com/download/
> >>
>

Mime
View raw message