hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From schubert zhang <zson...@gmail.com>
Subject Re: Data lost during intensive writes
Date Wed, 25 Mar 2009 14:40:56 GMT
Following is what I had send to J-D in another email thread, I will check
more logs of 3.24-25.


2009-03-23 10:07:57,465 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 2 on 9000, call
addBlock(/hbase/log_10.24.1.18_1237686636736_60020/hlog.dat.1237774027436,
DFSClient_629567488) from 10.24.1.18:59685: error:
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated
yet:/hbase/log_10.24.1.18_1237686636736_60020/hlog.dat.1237774027436
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated
yet:/hbase/log_10.24.1.18_1237686636736_60020/hlog.dat.1237774027436
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(Unknown
Source)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(Unknown Source)
at org.apache.hadoop.ipc.Server$Handler.run(Unknown Source)
2009-03-23 10:07:57,552 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addStoredBlock: blockMap updated:10.24.1.12:50010 is added to
blk_8246919716767617786_109126 size 1048576
2009-03-23 10:07:57,552 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addStoredBlock: blockMap updated:10.24.1.12:50010 is added to
blk_8246919716767617786_109126 size 1048576
2009-03-23 10:07:57,554 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.allocateBlock:
/hbase/log_10.24.1.16_1237686658208_60020/hlog.dat.1237774044443.
blk_45871727940505900_109126
2009-03-23 10:07:57,688 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addStoredBlock: blockMap updated:10.24.1.12:50010 is added to
blk_2378060095065607252_109126 size 1048576
2009-03-23 10:07:57,688 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addStoredBlock: blockMap updated:10.24.1.14:50010 is added to
blk_2378060095065607252_109126 size 1048576
2009-03-23 10:07:57,689 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.allocateBlock:
/hbase/log_10.24.1.14_1237686648061_60020/hlog.dat.1237774036841.
blk_8448212226292209521_109126
2009-03-23 10:07:57,869 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 1 on 9000, call
addBlock(/hbase/log_10.24.1.18_1237686636736_60020/hlog.dat.1237774027436,
DFSClient_629567488) from 10.24.1.18:59685: error:
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated
yet:/hbase/log_10.24.1.18_1237686636736_60020/hlog.dat.1237774027436
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated
yet:/hbase/log_10.24.1.18_1237686636736_60020/hlog.dat.1237774027436
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(Unknown
Source)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(Unknown Source)
at org.apache.hadoop.ipc.Server$Handler.run(Unknown Source)
2009-03-23 10:07:57,944 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addStoredBlock: blockMap updated:10.24.1.18:50010 is added to
blk_1270075611008480481_109121 size 1048576

I cannot find useful info in datanode's logs at the time point. But I find
something else, for examples:

2009-03-23 10:08:09,321 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
10.24.1.20:50010, storageID=DS-2136798339-10.24.1.20-50010-1237686444430,
infoPort=50075, ipcPort=50020):Failed to transfer
blk_-4099352067684877111_109151 to 10.24.1.18:50010 got
java.net.SocketException: Original Exception : java.io.IOException:
Connection reset by peer
        at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
        at
sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:418)
        at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:519)
        at org.apache.hadoop.net.SocketOutputStream.transferToFully(Unknown
Source)
        at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(Unknown
Source)
        at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(Unknown Source)
        at
org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(Unknown
Source)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException: Connection reset by peer
        ... 8 more

and.

2009-03-23 10:10:17,313 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
10.24.1.20:50010, storageID=DS-2136798339-10.24.1.20-50010-1237686444430,
infoPort=50075, ipcPort=50020):DataXceiver
org.apache.hadoop.hdfs.server.datanode.BlockAlreadyExistsException: Block
blk_-6347382571494739349_109326 is valid, and cannot be written to.
        at
org.apache.hadoop.hdfs.server.datanode.FSDataset.writeToBlock(Unknown
Source)
        at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(Unknown Source)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(Unknown
Source)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(Unknown
Source)
        at java.lang.Thread.run(Thread.java:619)



On Wed, Mar 25, 2009 at 9:36 PM, stack <stack@duboce.net> wrote:

> On Wed, Mar 25, 2009 at 2:01 AM, schubert zhang <zsongbo@gmail.com> wrote:
>
>
> > But the two
> > exceptions start to happen earlyer.
> >
>
> Which two exceptions Schubert?
>
>
> hadoop-0.19
> > hbase-0.19.1 (with patch
> > https://issues.apache.org/jira/browse/HBASE-1008)<
> https://issues.apache.org/jira/browse/HBASE-1008%29>
> > .<https://issues.apache.org/jira/browse/HBASE-1008>
> >
> > I want to try to set dfs.datanode.socket.write.timeout=0 and watch it
> > later.
>
>
> Later you ask, ' if set "dfs.datanode.socket.write.timeout=0", hadoop will
> always create new socket, is it ok?'  I traced write.timeout and looks like
> it becomes the socket timeout -- no other special handling seems to be
> done.  Perhaps I am missing something?   To what are you referring?
>
> Thanks,
> St.Ack
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message