hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Levin <magn...@gmail.com>
Subject Re: hdfs /DN errors
Date Mon, 28 Mar 2011 23:38:07 GMT
more data:

before datanode restart -


Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00    17.00   71.00   15.00 11648.00   448.00
140.65     7.08  133.13  11.62  99.90
sdb               0.00     4.00   79.00    4.00 13224.00    64.00
160.10     2.90   40.51   9.13  75.80

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          17.44    0.00    3.69   54.05    0.00   24.82

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util
sda               2.00     8.00   70.00    5.00 10584.00   104.00
142.51     9.37  153.17  13.33 100.00
sdb               0.00     0.00   47.00    0.00  7104.00     0.00
151.15     0.73   14.96   9.53  44.80

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          12.22    0.00    5.62   59.66    0.00   22.49

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util
sda               3.00   239.00   78.00    3.00  9352.00  1936.00
139.36     9.01   89.38  12.31  99.70
sdb               0.00     0.00   70.00    0.00 11744.00     0.00
167.77     2.39   34.56  10.77  75.40

16:36:16 10.101.6.4 root@rdaf4:/usr/java/latest/bin $ ps uax | grep datano
root     24358  0.0  0.0 103152   812 pts/0    S+   16:36   0:00 grep datano
hadoop   31249 11.6  3.6 4503764 596992 ?      Sl   11:49  33:25
/usr/java/latest/bin/java -Xmx2048m -server



After restart:

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    2.00    0.00   272.00     0.00
136.00     0.03   15.50  15.50   3.10
sdb               0.00     0.00   12.00    0.00  1176.00     0.00
98.00     0.08    6.83   6.83   8.20

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          10.64    0.00    1.73    1.98    0.00   85.64

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00    18.00    8.00   49.00  1848.00   536.00
41.82     0.46    8.04   1.07   6.10
sdb               0.00     0.00    8.00    0.00   720.00     0.00
90.00     0.06    7.75   6.25   5.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.23    0.00    0.75    0.50    0.00   94.53

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    2.00    0.00   272.00     0.00
136.00     0.03   13.00  13.00   2.60
sdb               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00






On Mon, Mar 28, 2011 at 4:28 PM, Jack Levin <magnito@gmail.com> wrote:
> Also, I can't even jstack the datanode, its CPU is low, and its not eating RAM:
>
> 16:21:29 10.103.7.3 root@mtag3:/usr/java/latest/bin $ ./jstack 31771
> 31771: Unable to open socket file: target process not responding or
> HotSpot VM not loaded
> The -F option can be used when the target process is not responding
> You have new mail in /var/spool/mail/root
> 16:21:54 10.103.7.3 root@mtag3:/usr/java/latest/bin $
>
>
> When I restart the process iowait goes back to normal.  Right now
> iowait in insanely higher compared to a server that had high IOwait
> but which I restarted, please see attached graph.
>
> Graph with IOwait drop is the datanode I restarted, the other, I can't
> jvm jstack from.
>
>
> -Jack
>
> On Mon, Mar 28, 2011 at 4:19 PM, Jack Levin <magnito@gmail.com> wrote:
>> Hello guys, we are getting those errors:
>>
>>
>> 2011-03-28 15:08:33,485 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>> /10.101.6.5:50010, dest: /10.101.6.5:51365, bytes: 66564, op:
>> HDFS_READ, cliI
>> D: DFSClient_hb_rs_rdaf5.prod.imageshack.com,60020,1301323415015_1301323415053,
>> offset: 4191232, srvID: DS-1528941561-10.101.6.5-50010-1299713950021,
>> blockid: blk_-30874978
>> 22408705276_723501, duration: 14409579
>> 2011-03-28 15:08:33,492 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>> /10.101.6.5:50010, dest: /10.101.6.5:51366, bytes: 14964, op:
>> HDFS_READ, cliI
>> D: DFSClient_hb_rs_rdaf5.prod.imageshack.com,60020,1301323415015_1301323415053,
>> offset: 67094016, srvID: DS-1528941561-10.101.6.5-50010-1299713950021,
>> blockid: blk_-3224146
>> 686136187733_731011, duration: 8855000
>> 2011-03-28 15:08:33,495 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>> /10.101.6.5:50010, dest: /10.101.6.5:51368, bytes: 51600, op:
>> HDFS_READ, cliI
>> D: DFSClient_hb_rs_rdaf5.prod.imageshack.com,60020,1301323415015_1301323415053,
>> offset: 0, srvID: DS-1528941561-10.101.6.5-50010-1299713950021,
>> blockid: blk_-63843345833451
>> 99846_731014, duration: 2053969
>> 2011-03-28 15:08:33,503 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>> /10.101.6.5:50010, dest: /10.101.6.5:42553, bytes: 462336, op:
>> HDFS_READ, cli
>> ID: DFSClient_hb_rs_rdaf5.prod.imageshack.com,60020,1301323415015_1301323415053,
>> offset: 327680, srvID: DS-1528941561-10.101.6.5-50010-1299713950021,
>> blockid: blk_-47512832
>> 94726600221_724785, duration: 480254862706
>> 2011-03-28 15:08:33,504 WARN
>> org.apache.hadoop.hdfs.server.datanode.DataNode:
>> DatanodeRegistration(10.101.6.5:50010,
>> storageID=DS-1528941561-10.101.6.5-50010-1299713950021,
>>  infoPort=50075, ipcPort=50020):Got exception while serving
>> blk_-4751283294726600221_724785 to /10.101.6.5:
>> java.net.SocketTimeoutException: 480000 millis timeout while waiting
>> for channel to be ready for write. ch :
>> java.nio.channels.SocketChannel[connected local=/10.101.6.5:500
>> 10 remote=/10.101.6.5:42553]
>>        at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
>>        at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
>>        at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
>>        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:350)
>>        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:436)
>>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:197)
>>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:110)
>>
>> 2011-03-28 15:08:33,504 ERROR
>> org.apache.hadoop.hdfs.server.datanode.DataNode:
>> DatanodeRegistration(10.101.6.5:50010,
>> storageID=DS-1528941561-10.101.6.5-50010-1299713950021
>> , infoPort=50075, ipcPort=50020):DataXceiver
>> java.net.SocketTimeoutException: 480000 millis timeout while waiting
>> for channel to be ready for write. ch :
>> java.nio.channels.SocketChannel[connected local=/10.101.6.5:500
>> 10 remote=/10.101.6.5:42553]
>>        at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
>>        at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
>>        at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
>>        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:350)
>>        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:436)
>>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:197)
>>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:110)
>> 2011-03-28 15:08:33,504 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>> /10.101.6.5:50010, dest: /10.101.6.5:51369, bytes: 66564, op:
>> HDFS_READ, cliI
>> D: DFSClient_hb_rs_rdaf5.prod.imageshack.com,60020,1301323415015_1301323415053,
>> offset: 4781568, srvID: DS-1528941561-10.101.6.5-50010-1299713950021,
>> blockid: blk_-30874978
>> 22408705276_723501, duration: 11478016
>> 2011-03-28 15:08:33,506 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>> /10.101.6.5:50010, dest: /10.101.6.5:51370, bytes: 66564, op:
>> HDFS_READ, cliI
>> D: DFSClient_hb_rs_rdaf5.prod.imageshack.com,60020,1301323415015_1301323415053,
>> offset: 66962944, srvID: DS-1528941561-10.101.6.5-50010-1299713950021,
>> blockid: blk_-3224146
>> 686136187733_731011, duration: 7643688
>>
>>
>> RS talking to DN, and we are getting timeouts, there are no issues
>> like ulimit afaik, as we start them with 32k.  Any ideas what the deal
>> is?
>>
>> -Jack
>>
>

Mime
View raw message