hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Neverending DataNode.clienttrace logging after starting Hbase
Date Thu, 04 Mar 2010 06:30:41 GMT
This looks like the product of the below addtion:

------------------------------------------------------------------------
r687868 | cdouglas | 2008-08-21 14:27:31 -0700 (Thu, 21 Aug 2008) | 3 lines

HADOOP-3062. Add metrics to DataNode and TaskTracker to record network
traffic for HDFS reads/writes and MR shuffling.


Your problem is the quantity of the traces is overwhelming?  If so, we
should make a patch for 0.20 hadoop to turn off the above tracing when
not wanted.

St.Ack



On Wed, Mar 3, 2010 at 2:22 PM, Rod Cope <rod.cope@openlogic.com> wrote:
> We have a problem on our Hadoop cluster where a random subset of our
> datanode logs are filling up with GB¹s of clienttrace messages after
> starting HBase.  The logs are fine, fsck shows a perfect report, and all is
> well until HBase is started.  Without running any MR jobs or using
> HBase/HDFS, we¹ve let it go for 15+ hours without seeing any change in
> logging behavior.  The datanode logs look like this:
>
> ...
> 2010-03-03 09:50:16,738 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> /192.168.60.106:50010, dest: /192.168.60.106:47445, bytes: 23710, op:
> HDFS_READ, cliID: DFSClient_1676127095, offset: 82432, srvID:
> DS-35570150-192.168.60.106-50010-1267000204488, blockid:
> blk_3353532406500894348_402571, duration: 93000
> 2010-03-03 09:50:16,739 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> /192.168.60.106:50010, dest: /192.168.60.106:47446, bytes: 226458, op:
> HDFS_READ, cliID: DFSClient_1676127095, offset: 1326592, srvID:
> DS-35570150-192.168.60.106-50010-1267000204488, blockid:
> blk_1719430407401031560_402554, duration: 256000
> 2010-03-03 09:50:16,743 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> /192.168.60.106:50010, dest: /192.168.60.106:47449, bytes: 5073, op:
> HDFS_READ, cliID: DFSClient_1676127095, offset: 16384, srvID:
> DS-35570150-192.168.60.106-50010-1267000204488, blockid:
> blk_-4376123158716665125_402579, duration: 106000
> 2010-03-03 09:50:16,743 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> /192.168.60.106:50010, dest: /192.168.60.106:47450, bytes: 7974, op:
> HDFS_READ, cliID: DFSClient_1676127095, offset: 24576, srvID:
> DS-35570150-192.168.60.106-50010-1267000204488, blockid:
> blk_-4354660273604489760_402576, duration: 118000
> 2010-03-03 09:50:16,744 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> /192.168.60.106:50010, dest: /192.168.60.106:47451, bytes: 23710, op:
> HDFS_READ, cliID: DFSClient_1676127095, offset: 82432, srvID:
> DS-35570150-192.168.60.106-50010-1267000204488, blockid:
> blk_3353532406500894348_402571, duration: 88000
> 2010-03-03 09:50:16,745 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> /192.168.60.106:50010, dest: /192.168.60.106:47452, bytes: 226458, op:
> HDFS_READ, cliID: DFSClient_1676127095, offset: 1326592, srvID:
> DS-35570150-192.168.60.106-50010-1267000204488, blockid:
> blk_1719430407401031560_402554, duration: 253000
> 2010-03-03 09:50:16,749 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> /192.168.60.106:50010, dest: /192.168.60.106:47455, bytes: 5073, op:
> HDFS_READ, cliID: DFSClient_1676127095, offset: 16384, srvID:
> DS-35570150-192.168.60.106-50010-1267000204488, blockid:
> blk_-4376123158716665125_402579, duration: 81000
> 2010-03-03 09:50:16,749 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> /192.168.60.106:50010, dest: /192.168.60.106:47456, bytes: 7974, op:
> HDFS_READ, cliID: DFSClient_1676127095, offset: 24576, srvID:
> DS-35570150-192.168.60.106-50010-1267000204488, blockid:
> blk_-4354660273604489760_402576, duration: 76000
> 2010-03-03 09:50:16,750 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> /192.168.60.106:50010, dest: /192.168.60.106:47457, bytes: 23710, op:
> HDFS_READ, cliID: DFSClient_1676127095, offset: 82432, srvID:
> DS-35570150-192.168.60.106-50010-1267000204488, blockid:
> blk_3353532406500894348_402571, duration: 91000
> 2010-03-03 09:50:16,750 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> /192.168.60.106:50010, dest: /192.168.60.106:47458, bytes: 226458, op:
> HDFS_READ, cliID: DFSClient_1676127095, offset: 1326592, srvID:
> DS-35570150-192.168.60.106-50010-1267000204488, blockid:
> blk_1719430407401031560_402554, duration: 237000
> ...
>
> The logs (and top and tcptrack) show bursts of activity for perhaps 15
> seconds at a time, then it stops for 30 seconds or so and starts all over
> again.  Frequently, there will be a line or two like this after each burst:
> 2010-03-03 09:52:35,255 INFO
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> succeeded for blk_8884930029593424313_289042
>
> When we restart the cluster, the problem usually moves to a different random
> subset of nodes.  This has happened at least 5 times and there¹s no way
> we¹ve been able to predict which machine(s) will have this weird logging
> behavior.
>
> I¹ve noticed that the cliID in the logs is identical across the machines
> exhibiting this behavior.  Is there a way to tell which client that is?  As
> I said, we don¹t have any MR jobs running.
>
> We¹re on the latest CDH test release (February 9, 0.20.1+169.56), which is
> pretty close to Hadoop 0.20.2.  The data in HDFS is almost entirely HBase
> (0.20.3), but there are a few other blocks of data.  The problems just
> started in the last few days.  We¹ve recently gone through some very heavy
> data loading into HBase, but hadn¹t seen this problem in the past when we¹ve
> done the same thing.
>
> The data nodes and regionservers stay up without any errors and I can run
> counts from the shell and Hadoop jobs that read every row of my largest
> table without issues, but the burst of logging activity is accompanied by a
> spike of CPU, network, and disk that impacts performance and I¹m afraid it
> will cause timeouts or other problems.  We¹re also getting datanode log
> files that run into several GB¹s a day.
>
> Any ideas?
>
> Thanks,
> Rod
>

Mime
View raw message