Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3C677C856 for ; Fri, 11 May 2012 16:56:25 +0000 (UTC) Received: (qmail 10165 invoked by uid 500); 11 May 2012 16:56:19 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 10046 invoked by uid 500); 11 May 2012 16:56:19 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 10032 invoked by uid 500); 11 May 2012 16:56:19 -0000 Delivered-To: apmail-hadoop-hbase-user@hadoop.apache.org Received: (qmail 10025 invoked by uid 99); 11 May 2012 16:56:19 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 May 2012 16:56:19 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of sulabhc@gmail.com designates 209.85.213.48 as permitted sender) Received: from [209.85.213.48] (HELO mail-yw0-f48.google.com) (209.85.213.48) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 May 2012 16:56:13 +0000 Received: by yhfq46 with SMTP id q46so3742879yhf.35 for ; Fri, 11 May 2012 09:55:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=ISltCmjk7qvgceXLRgA2SOoUqd2w6kP7KS0Fm/qs/eg=; b=OYf+SUQ28LaiO7SeDUyjfUJMlEzykfAFdicLlzTV5GZ+5bgIgGvOBsAg8pGHEnbDLX ZG6ws1UGHMu1CgafVRVaeQub1XavIIz6E+ihsva6jzxLIa+uFXK27vm2KyO0KpfoUGJi plivEr/+wt+BM+BG1itY09A0zAhefKvMQ7dwnhsswRFNEq2WV9u9WvFcjVOTWbr/ss+F rRFj4TXcIf1z/s6v63535Zu/+KrzLmsXzCQ+HcagE/MlROyUjeQk5uvHk9q8gghxv5Rq dnoQXrGDUI0tN2w8uKYjI5h8/pJ4EQWOawVsfwAVNHGjOfASW/9infF0szoeqie48xH+ ve/A== Received: by 10.50.222.233 with SMTP id qp9mr2269719igc.58.1336755351794; Fri, 11 May 2012 09:55:51 -0700 (PDT) MIME-Version: 1.0 Received: by 10.43.48.72 with HTTP; Fri, 11 May 2012 09:55:31 -0700 (PDT) From: sulabh choudhury Date: Fri, 11 May 2012 09:55:31 -0700 Message-ID: Subject: DatanodeRegistration, socketTImeOutException To: common-user@hadoop.apache.org, hbase-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=14dae93406d7e531f404bfc59bd9 X-Virus-Checked: Checked by ClamAV on apache.org --14dae93406d7e531f404bfc59bd9 Content-Type: text/plain; charset=ISO-8859-1 I have spent a lot of time trying to find a solution to this issue, but have had no luck. I think this is because of the Hbase's read/write pattern, but I do not see any related errors in hbase logs. Does not look like it is because of a GC pause, but seeing several 480000 ms timeOut certainly suggests something is really slowing down the *writes *( I do see this only in the write ch.) In my dataNode logs I see tonnes of 2012-05-11 09:34:30,953 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( 10.10.2.102:50010, storageID=DS-1494937024-10.10.2.102-50010-1305755343443, infoPort=50075, ipcPort=50020):Got exception while serving blk_-5331817573170456741_12784653 to /10.10.2.102: java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for *write*. ch : java.nio.channels.SocketChannel[connected local=/10.10.2.102:50010 remote=/ 10.10.2.102:46752] at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:267) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:163) 2012-05-11 09:34:30,953 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( 10.10.2.102:50010, storageID=DS-1494937024-10.10.2.102-50010-1305755343443, infoPort=50075, ipcPort=50020):DataXceiver java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.10.2.102:50010 remote=/ 10.10.2.102:46752] at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:267) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:163) This block is mapped to a Hbase region, from NN logs :- 2012-05-10 15:46:35,117 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /hbase/table1/5a84f3844b7fd049c73a78b78ba6c2cf/.tmp/1639371300072460962. blk_4283960240517860151_12781124 2012-05-10 15:47:18,000 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.10.2.103:50010 is added to blk_4283960240517860151_12781124 size 134217728 2012-05-10 15:47:18,000 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.10.2.102:50010 is added to blk_4283960240517860151_12781124 size 134217728 I am running hbase-0.90.4-cdh3u3 on hadoop-0.20.2-cdh3u3 --14dae93406d7e531f404bfc59bd9--