Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 01DFEFC22 for ; Wed, 1 May 2013 06:08:25 +0000 (UTC) Received: (qmail 40262 invoked by uid 500); 1 May 2013 06:08:20 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 39902 invoked by uid 500); 1 May 2013 06:08:19 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 39883 invoked by uid 99); 1 May 2013 06:08:19 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 May 2013 06:08:19 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of harsh@cloudera.com designates 209.85.210.182 as permitted sender) Received: from [209.85.210.182] (HELO mail-ia0-f182.google.com) (209.85.210.182) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 May 2013 06:08:12 +0000 Received: by mail-ia0-f182.google.com with SMTP id w33so1138339iag.13 for ; Tue, 30 Apr 2013 23:07:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type:x-gm-message-state; bh=zhBUXNESrVcJ307mNyhLdRkbKnGtZF9EN8UXdWkebfE=; b=YvzefYar1O5tpIYMZBG92CQpqX7yAWOHd/LqIWyyo9rmIpYc7lZFKqIpsFOyc2Qvcg iQJdib0MKiC0hfUvuoFZydxa9Ke2CY9bUS3vHhkhNX53yyUJHkAo1WqZ8D0ZFLlX9AJb xiA3C/gQsuq+qwmdKm2XxH8heVUrZm7JdkGmlDEF8zN94UCapgnrKmGHXo2NH7lHuPpS cKIsAZCNQG/ywKk+IOgRMJQYcxa5v6ERaWbS8+5STYazioibiByz+fuDi2GUwpxWKp3g 6ko6Kudyx25AmwDUZ+Rihr5WJaezXJlbly0Qwr390BpKmg0mAH9CYiualoDrtEWzhg8m rF3g== X-Received: by 10.50.36.229 with SMTP id t5mr5745960igj.18.1367388471394; Tue, 30 Apr 2013 23:07:51 -0700 (PDT) MIME-Version: 1.0 Received: by 10.50.93.100 with HTTP; Tue, 30 Apr 2013 23:07:31 -0700 (PDT) In-Reply-To: References: From: Harsh J Date: Wed, 1 May 2013 11:37:31 +0530 Message-ID: Subject: Re: High IO Usage in Datanodes due to Replication To: "" Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQlfAeQYzEgEFUiolISCKjhut92VANRC+NldA1diGJVJur6ws8bV9osnHurIsmt2a4+7H1tI X-Virus-Checked: Checked by ClamAV on apache.org The block scanner is a simple, independent operation of the DN that runs periodically and does work in small phases, to ensure that no blocks exist that aren't matching their checksums (its an automatic data validator) - such that it may report corrupt/rotting blocks and keep the cluster healthy. Its runtime shouldn't cause any issues, unless your DN has a lot of blocks (more than normal due to overload of small, inefficient files) but too little heap size to perform retention plus block scanning. > 1. Is data node will not allow to write the data during DataBlockScanning process ? No such thing. As I said, its independent and mostly lock free. Writes or reads are not hampered. > 2. Is data node will come normal only when "Not yet verified" come to zero in data node blockScannerReport ? Yes, but note that this runs over and over again (once every 3 weeks IIRC). On Wed, May 1, 2013 at 11:33 AM, selva wrote: > Thanks Harsh & Manoj for the inputs. > > Now i found that the data node is busy with block scanning. I have TBs data > attached with each data node. So its taking days to complete the data block > scanning. I have two questions. > > 1. Is data node will not allow to write the data during DataBlockScanning > process ? > > 2. Is data node will come normal only when "Not yet verified" come to zero > in data node blockScannerReport ? > > # Data node logs > > 2013-05-01 05:53:50,639 INFO > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification > succeeded for blk_-7605405041820244736_20626608 > 2013-05-01 05:53:50,664 INFO > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification > succeeded for blk_-1425088964531225881_20391711 > 2013-05-01 05:53:50,692 INFO > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification > succeeded for blk_2259194263704433881_10277076 > 2013-05-01 05:53:50,740 INFO > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification > succeeded for blk_2653195657740262633_18315696 > 2013-05-01 05:53:50,818 INFO > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification > succeeded for blk_-5124560783595402637_20821252 > 2013-05-01 05:53:50,866 INFO > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification > succeeded for blk_6596021414426970798_19649117 > 2013-05-01 05:53:50,931 INFO > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification > succeeded for blk_7026400040099637841_20741138 > 2013-05-01 05:53:50,992 INFO > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification > succeeded for blk_8535358360851622516_20694185 > 2013-05-01 05:53:51,057 INFO > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification > succeeded for blk_7959856580255809601_20559830 > > # One of my Data node block scanning report > > http://:15075/blockScannerReport > > Total Blocks : 2037907 > Verified in last hour : 4819 > Verified in last day : 107355 > Verified in last week : 686873 > Verified in last four weeks : 1589964 > Verified in SCAN_PERIOD : 1474221 > Not yet verified : 447943 > Verified since restart : 318433 > Scans since restart : 318058 > Scan errors since restart : 0 > Transient scan errors : 0 > Current scan rate limit KBps : 3205 > Progress this period : 101% > Time left in cur period : 86.02% > > Thanks > Selva > > > -----Original Message----- > From "S, Manoj" > Subject RE: High IO Usage in Datanodes due to Replication > Date Mon, 29 Apr 2013 06:41:31 GMT > Adding to Harsh's comments: > > You can also tweak a few OS level parameters to improve the I/O performance. > 1) Mount the filesystem with "noatime" option. > 2) Check if changing the IO scheduling the algorithm will improve the > cluster's performance. > (Check this file /sys/block//queue/scheduler) > 3) If there are lots of I/O requests and your cluster hangs because of that, > you can increase > the queue length by increasing the value in > /sys/block//queue/nr_requests. > > -----Original Message----- > From: Harsh J [mailto:harsh@cloudera.com] > Sent: Sunday, April 28, 2013 12:03 AM > To: > Subject: Re: High IO Usage in Datanodes due to Replication > > They seem to be transferring blocks between one another. This may most > likely be due to under-replication > and the NN UI will have numbers on work left to perform. The inter-DN > transfer is controlled > by the balancing bandwidth though, so you can lower that down if you want > to, to cripple it > - but you'll lose out on time for a perfectly replicated state again. > > On Sat, Apr 27, 2013 at 11:33 PM, selva wrote: >> Hi All, >> >> I have lost amazon instances of my hadoop cluster. But i had all the >> data in aws EBS volumes. So i launched new instances and attached volumes. >> >> But all of the datanode logs keep on print the below lines it cauased >> to high IO rate. Due to IO usage i am not able to run any jobs. >> >> Can anyone help me to understand what it is doing? Thanks in advance. >> >> 2013-04-27 17:51:40,197 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode: >> DatanodeRegistration(10.157.10.242:10013, >> storageID=DS-407656544-10.28.217.27-10013-1353165843727, >> infoPort=15075, >> ipcPort=10014) Starting thread to transfer block >> blk_2440813767266473910_11564425 to 10.168.18.178:10013 >> 2013-04-27 17:51:40,230 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode: >> DatanodeRegistration(10.157.10.242:10013, >> storageID=DS-407656544-10.28.217.27-10013-1353165843727, >> infoPort=15075, ipcPort=10014):Transmitted block >> blk_2440813767266473910_11564425 to >> /10.168.18.178:10013 >> 2013-04-27 17:51:40,433 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block >> blk_2442656050740605335_10906493 src: /10.171.11.11:60744 dest: >> /10.157.10.242:10013 >> 2013-04-27 17:51:40,450 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode: Received block >> blk_2442656050740605335_10906493 src: /10.171.11.11:60744 dest: >> /10.157.10.242:10013 of size 25431 >> >> Thanks >> Selva >> >> >> >> >> >> > > > > -- > Harsh J > -- Harsh J