hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From selva <selvai...@gmail.com>
Subject Re: High IO Usage in Datanodes due to Replication
Date Wed, 01 May 2013 06:32:26 GMT
Thanks a lot Harsh. Your input is really valuable for me.

As you mentioned above, we have overload of many small files in our
cluster.

Also when i load data huge data to hive tables, It throws an exception like
"replicated to to 0 nodes instead of 1". When i google it i found one of
the reason matches my case  "Data Node is Busy with block report and block
scanning" @ http://bit.ly/ZToyNi

Is increasing the Block scanning and scanning all inefficient small files
will fix my problem ?

Thanks
Selva


On Wed, May 1, 2013 at 11:37 AM, Harsh J <harsh@cloudera.com> wrote:

> The block scanner is a simple, independent operation of the DN that
> runs periodically and does work in small phases, to ensure that no
> blocks exist that aren't matching their checksums (its an automatic
> data validator) - such that it may report corrupt/rotting blocks and
> keep the cluster healthy.
>
> Its runtime shouldn't cause any issues, unless your DN has a lot of
> blocks (more than normal due to overload of small, inefficient files)
> but too little heap size to perform retention plus block scanning.
>
> > 1. Is data node will not allow to write the data during
> DataBlockScanning process ?
>
> No such thing. As I said, its independent and mostly lock free. Writes
> or reads are not hampered.
>
> > 2. Is data node will come normal only when "Not yet verified" come to
> zero in data node blockScannerReport ?
>
> Yes, but note that this runs over and over again (once every 3 weeks IIRC).
>
> On Wed, May 1, 2013 at 11:33 AM, selva <selvait90@gmail.com> wrote:
> > Thanks Harsh & Manoj for the inputs.
> >
> > Now i found that the data node is busy with block scanning. I have TBs
> data
> > attached with each data node. So its taking days to complete the data
> block
> > scanning. I have two questions.
> >
> > 1. Is data node will not allow to write the data during DataBlockScanning
> > process ?
> >
> > 2. Is data node will come normal only when "Not yet verified" come to
> zero
> > in data node blockScannerReport ?
> >
> > # Data node logs
> >
> > 2013-05-01 05:53:50,639 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> > succeeded for blk_-7605405041820244736_20626608
> > 2013-05-01 05:53:50,664 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> > succeeded for blk_-1425088964531225881_20391711
> > 2013-05-01 05:53:50,692 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> > succeeded for blk_2259194263704433881_10277076
> > 2013-05-01 05:53:50,740 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> > succeeded for blk_2653195657740262633_18315696
> > 2013-05-01 05:53:50,818 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> > succeeded for blk_-5124560783595402637_20821252
> > 2013-05-01 05:53:50,866 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> > succeeded for blk_6596021414426970798_19649117
> > 2013-05-01 05:53:50,931 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> > succeeded for blk_7026400040099637841_20741138
> > 2013-05-01 05:53:50,992 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> > succeeded for blk_8535358360851622516_20694185
> > 2013-05-01 05:53:51,057 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> > succeeded for blk_7959856580255809601_20559830
> >
> > # One of my Data node block scanning report
> >
> > http://<datanode-host>:15075/blockScannerReport
> >
> > Total Blocks                 : 2037907
> > Verified in last hour        :   4819
> > Verified in last day         : 107355
> > Verified in last week        : 686873
> > Verified in last four weeks  : 1589964
> > Verified in SCAN_PERIOD      : 1474221
> > Not yet verified             : 447943
> > Verified since restart       : 318433
> > Scans since restart          : 318058
> > Scan errors since restart    :      0
> > Transient scan errors        :      0
> > Current scan rate limit KBps :   3205
> > Progress this period         :    101%
> > Time left in cur period      :  86.02%
> >
> > Thanks
> > Selva
> >
> >
> > -----Original Message-----
> > From "S, Manoj" <mano...@intel.com>
> > Subject RE: High IO Usage in Datanodes due to Replication
> > Date Mon, 29 Apr 2013 06:41:31 GMT
> > Adding to Harsh's comments:
> >
> > You can also tweak a few OS level parameters to improve the I/O
> performance.
> > 1) Mount the filesystem with "noatime" option.
> > 2) Check if changing the IO scheduling the algorithm will improve the
> > cluster's performance.
> > (Check this file /sys/block/<device_name>/queue/scheduler)
> > 3) If there are lots of I/O requests and your cluster hangs because of
> that,
> > you can increase
> > the queue length by increasing the value in
> > /sys/block/<device_name>/queue/nr_requests.
> >
> > -----Original Message-----
> > From: Harsh J [mailto:harsh@cloudera.com]
> > Sent: Sunday, April 28, 2013 12:03 AM
> > To: <user@hadoop.apache.org>
> > Subject: Re: High IO Usage in Datanodes due to Replication
> >
> > They seem to be transferring blocks between one another. This may most
> > likely be due to under-replication
> > and the NN UI will have numbers on work left to perform. The inter-DN
> > transfer is controlled
> > by the balancing bandwidth though, so you can lower that down if you want
> > to, to cripple it
> > - but you'll lose out on time for a perfectly replicated state again.
> >
> > On Sat, Apr 27, 2013 at 11:33 PM, selva <selvait90@gmail.com> wrote:
> >> Hi All,
> >>
> >> I have lost amazon instances of my hadoop cluster. But i had all the
> >> data in aws EBS volumes. So i launched new instances and attached
> volumes.
> >>
> >> But all of the datanode logs keep on print the below lines it cauased
> >> to high IO rate. Due to IO usage i am not able to run any jobs.
> >>
> >> Can anyone help me to understand what it is doing? Thanks in advance.
> >>
> >> 2013-04-27 17:51:40,197 INFO
> >> org.apache.hadoop.hdfs.server.datanode.DataNode:
> >> DatanodeRegistration(10.157.10.242:10013,
> >> storageID=DS-407656544-10.28.217.27-10013-1353165843727,
> >> infoPort=15075,
> >> ipcPort=10014) Starting thread to transfer block
> >> blk_2440813767266473910_11564425 to 10.168.18.178:10013
> >> 2013-04-27 17:51:40,230 INFO
> >> org.apache.hadoop.hdfs.server.datanode.DataNode:
> >> DatanodeRegistration(10.157.10.242:10013,
> >> storageID=DS-407656544-10.28.217.27-10013-1353165843727,
> >> infoPort=15075, ipcPort=10014):Transmitted block
> >> blk_2440813767266473910_11564425 to
> >> /10.168.18.178:10013
> >> 2013-04-27 17:51:40,433 INFO
> >> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> >> blk_2442656050740605335_10906493 src: /10.171.11.11:60744 dest:
> >> /10.157.10.242:10013
> >> 2013-04-27 17:51:40,450 INFO
> >> org.apache.hadoop.hdfs.server.datanode.DataNode: Received block
> >> blk_2442656050740605335_10906493 src: /10.171.11.11:60744 dest:
> >> /10.157.10.242:10013 of size 25431
> >>
> >> Thanks
> >> Selva
> >>
> >>
> >>
> >>
> >>
> >>
> >
> >
> >
> > --
> > Harsh J
> >
>
>
>
> --
> Harsh J
>



-- 
-- selva

Mime
View raw message