Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 867DBEB4B for ; Mon, 25 Feb 2013 10:08:21 +0000 (UTC) Received: (qmail 98000 invoked by uid 500); 25 Feb 2013 10:08:16 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 97845 invoked by uid 500); 25 Feb 2013 10:08:16 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 97824 invoked by uid 99); 25 Feb 2013 10:08:15 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Feb 2013 10:08:15 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of nkeywal@gmail.com designates 74.125.82.47 as permitted sender) Received: from [74.125.82.47] (HELO mail-wg0-f47.google.com) (74.125.82.47) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Feb 2013 10:08:08 +0000 Received: by mail-wg0-f47.google.com with SMTP id dr13so2191924wgb.26 for ; Mon, 25 Feb 2013 02:07:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=9AOrFh24eA0HSLSMH2/nK6HhW/kLhFvOSbBSsTyEQEg=; b=xm/VwMX2J5UCoTYmWbhvnFQjwYsx3zW3Uw9XDo4uAUv88tL4Z5pXnAQTLwKKQa0jTY kTH10MbL7orKK84CoqA7oi38QBOm2B4P88l+Dft7cfWaaERx4h7aZeqd/rfLmWRKTRL0 JRmdSzu8+8/MZLXVD+Buy0bMmTNPHdFwXI86iBCHdlAgHOIWZKXeW0vzY/zGZ/mLpMED zTpZOvJjlhHj6Sz/lLJGkBeT1SpMFlixgyED4Ay9TIBWgzevbiupt08bLNAmeDnvRSjD yP53Gm52MNf66qC0QZoZhUpyAJ1KSkyao+8cbG5CAQDN7VV7VJJRrlogdG+sEbKmgmBL 02yQ== X-Received: by 10.180.98.98 with SMTP id eh2mr10862517wib.7.1361786867838; Mon, 25 Feb 2013 02:07:47 -0800 (PST) MIME-Version: 1.0 Received: by 10.194.110.227 with HTTP; Mon, 25 Feb 2013 02:07:27 -0800 (PST) In-Reply-To: References: From: Nicolas Liochon Date: Mon, 25 Feb 2013 11:07:27 +0100 Message-ID: Subject: Re: Datanodes shutdown and HBase's regionservers not working To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=bcaec5555040846ff604d689b67a X-Virus-Checked: Checked by ClamAV on apache.org --bcaec5555040846ff604d689b67a Content-Type: text/plain; charset=ISO-8859-1 I agree. Then for HDFS, ... The first thing to check is the network I would say. On Mon, Feb 25, 2013 at 10:46 AM, Davey Yan wrote: > Thanks for reply, Nicolas. > > My question: What can lead to shutdown of all of the datanodes? > I believe that the regionservers will be OK if the HDFS is OK. > > > On Mon, Feb 25, 2013 at 5:31 PM, Nicolas Liochon > wrote: > > Ok, what's your question? > > When you say the datanode went down, was it the datanode processes or the > > machines, with both the datanodes and the regionservers? > > > > The NameNode pings its datanodes every 3 seconds. However it will > internally > > mark the datanodes as dead after 10:30 minutes (even if in the gui you > have > > 'no answer for x minutes'). > > HBase monitoring is done by ZooKeeper. By default, a regionserver is > > considered as dead after 180s with no answer. Before, well, it's > considered > > as live. > > When you stop a regionserver, it tries to flush its data to the disk > (i.e. > > hdfs, i.e. the datanodes). That's why if you have no datanodes, or if a > high > > ratio of your datanodes are dead, it can't shutdown. Connection refused & > > socket timeouts come from the fact that before the 10:30 minutes hdfs > does > > not declare the nodes as dead, so hbase tries to use them (and, > obviously, > > fails). Note that there is now an intermediate state for hdfs datanodes, > > called "stale": an intermediary state where the datanode is used only if > you > > have to (i.e. it's the only datanode with a block replica you need). It > will > > be documented in HBase for the 0.96 release. But if all your datanodes > are > > down it won't change much. > > > > Cheers, > > > > Nicolas > > > > > > > > On Mon, Feb 25, 2013 at 10:10 AM, Davey Yan wrote: > >> > >> Hey guys, > >> > >> We have a cluster with 5 nodes(1 NN and 4 DNs) running for more than 1 > >> year, and it works fine. > >> But the datanodes got shutdown twice in the last month. > >> > >> When the datanodes got shutdown, all of them became "Dead Nodes" in > >> the NN web admin UI(http://ip:50070/dfshealth.jsp), > >> but regionservers of HBase were still live in the HBase web > >> admin(http://ip:60010/master-status), of course, they were zombies. > >> All of the processes of jvm were still running, including > >> hmaster/namenode/regionserver/datanode. > >> > >> When the datanodes got shutdown, the load (using the "top" command) of > >> slaves became very high, more than 10, higher than normal running. > >> From the "top" command, we saw that the processes of datanode and > >> regionserver were comsuming CPU. > >> > >> We could not stop the HBase or Hadoop cluster through normal > >> commands(stop-*.sh/*-daemon.sh stop *). > >> So we stopped datanodes and regionservers by kill -9 PID, then the > >> load of slaves returned to normal level, and we start the cluster > >> again. > >> > >> > >> Log of NN at the shutdown point(All of the DNs were removed): > >> 2013-02-22 11:10:02,278 INFO org.apache.hadoop.net.NetworkTopology: > >> Removing a node: /default-rack/192.168.1.152:50010 > >> 2013-02-22 11:10:02,278 INFO org.apache.hadoop.hdfs.StateChange: > >> BLOCK* NameSystem.heartbeatCheck: lost heartbeat from > >> 192.168.1.149:50010 > >> 2013-02-22 11:10:02,693 INFO org.apache.hadoop.net.NetworkTopology: > >> Removing a node: /default-rack/192.168.1.149:50010 > >> 2013-02-22 11:10:02,693 INFO org.apache.hadoop.hdfs.StateChange: > >> BLOCK* NameSystem.heartbeatCheck: lost heartbeat from > >> 192.168.1.150:50010 > >> 2013-02-22 11:10:03,004 INFO org.apache.hadoop.net.NetworkTopology: > >> Removing a node: /default-rack/192.168.1.150:50010 > >> 2013-02-22 11:10:03,004 INFO org.apache.hadoop.hdfs.StateChange: > >> BLOCK* NameSystem.heartbeatCheck: lost heartbeat from > >> 192.168.1.148:50010 > >> 2013-02-22 11:10:03,339 INFO org.apache.hadoop.net.NetworkTopology: > >> Removing a node: /default-rack/192.168.1.148:50010 > >> > >> > >> Logs in DNs indicated there were many IOException and > >> SocketTimeoutException: > >> 2013-02-22 11:02:52,354 ERROR > >> org.apache.hadoop.hdfs.server.datanode.DataNode: > >> DatanodeRegistration(192.168.1.148:50010, > >> storageID=DS-970284113-117.25.149.160-50010-1328074119937, > >> infoPort=50075, ipcPort=50020):DataXceiver > >> java.io.IOException: Interrupted receiveBlock > >> at > >> > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:577) > >> at > >> > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:398) > >> at > >> > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:107) > >> at java.lang.Thread.run(Thread.java:662) > >> 2013-02-22 11:03:44,823 WARN > >> org.apache.hadoop.hdfs.server.datanode.DataNode: > >> DatanodeRegistration(192.168.1.148:50010, > >> storageID=DS-970284113-117.25.149.160-50010-1328074119937, > >> infoPort=50075, ipcPort=50020):Got exception while serving > >> blk_-1985405101514576650_247001 to /192.168.1.148: > >> java.net.SocketTimeoutException: 480000 millis timeout while waiting > >> for channel to be ready for write. ch : > >> java.nio.channels.SocketChannel[connected local=/192.168.1.148:50010 > >> remote=/192.168.1.148:48654] > >> at > >> > org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) > >> at > >> > org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) > >> at > >> > org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) > >> at > >> > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:350) > >> at > >> > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:436) > >> at > >> > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:197) > >> at > >> > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:99) > >> at java.lang.Thread.run(Thread.java:662) > >> 2013-02-22 11:09:42,294 ERROR > >> org.apache.hadoop.hdfs.server.datanode.DataNode: > >> DatanodeRegistration(192.168.1.148:50010, > >> storageID=DS-970284113-117.25.149.160-50010-1328074119937, > >> infoPort=50075, ipcPort=50020):DataXceiver > >> java.net.SocketTimeoutException: 480000 millis timeout while waiting > >> for channel to be ready for write. ch : > >> java.nio.channels.SocketChannel[connected local=/192.168.1.148:50010 > >> remote=/192.168.1.148:37188] > >> at > >> > org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) > >> at > >> > org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) > >> at > >> > org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) > >> at > >> > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:350) > >> at > >> > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:436) > >> at > >> > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:197) > >> at > >> > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:99) > >> at java.lang.Thread.run(Thread.java:662) > >> 2013-02-22 11:12:41,892 INFO > >> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification > >> succeeded for blk_-2674357249542194287_43419 > >> > >> > >> Here is our env: > >> hadoop 1.0.3 > >> hbase 0.94.1(snappy enabled) > >> > >> java version "1.6.0_31" > >> Java(TM) SE Runtime Environment (build 1.6.0_31-b04) > >> Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode) > >> > >> # ulimit -a > >> core file size (blocks, -c) 0 > >> data seg size (kbytes, -d) unlimited > >> scheduling priority (-e) 20 > >> file size (blocks, -f) unlimited > >> pending signals (-i) 16382 > >> max locked memory (kbytes, -l) 64 > >> max memory size (kbytes, -m) unlimited > >> open files (-n) 32768 > >> pipe size (512 bytes, -p) 8 > >> POSIX message queues (bytes, -q) 819200 > >> real-time priority (-r) 0 > >> stack size (kbytes, -s) 8192 > >> cpu time (seconds, -t) unlimited > >> max user processes (-u) 32768 > >> virtual memory (kbytes, -v) unlimited > >> file locks (-x) unlimited > >> > >> # uname -a > >> Linux ubuntu6401 2.6.32-33-server #70-Ubuntu SMP Thu Jul 7 22:28:30 > >> UTC 2011 x86_64 GNU/Linux > >> > >> > >> # free(master) > >> total used free shared buffers > cached > >> Mem: 24732936 8383708 16349228 0 490584 > 2580356 > >> -/+ buffers/cache: 5312768 19420168 > >> Swap: 72458232 0 72458232 > >> > >> > >> # free(slaves) > >> total used free shared buffers > cached > >> Mem: 24733000 22824276 1908724 0 862556 > 15303304 > >> -/+ buffers/cache: 6658416 18074584 > >> Swap: 72458232 264 72457968 > >> > >> > >> Some important conf: > >> core-site.xml > >> > >> io.file.buffer.size > >> 65536 > >> > >> > >> hdfs-site.xml > >> > >> dfs.block.size > >> 134217728 > >> > >> > >> dfs.datanode.max.xcievers > >> 4096 > >> > >> > >> dfs.support.append > >> true > >> > >> > >> dfs.replication > >> 2 > >> > >> > >> > >> Hope you can help us. > >> Thanks in advance. > >> > >> > >> > >> -- > >> Davey Yan > > > > > > > > -- > Davey Yan > --bcaec5555040846ff604d689b67a Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
I agree.
Then for HDFS, ...
Th= e first thing to check is the network I would say.

<= br>


On Mon, Feb 25, 2013 at 10:46 AM, Davey Yan <davey.yan@gmail.com&g= t; wrote:
Thanks for reply, Nicolas.

My question: What can lead to shutdown of all of the datanodes?
I believe that the regionservers will be OK if the HDFS is OK.


On Mon, Feb 25, 2013 at 5:31 PM, Nicolas Liochon <nkeywal@gmail.com> wrote:
> Ok, what's your question?
> When you say the datanode went down, was it the datanode processes or = the
> machines, with both the datanodes and the regionservers?
>
> The NameNode pings its datanodes every 3 seconds. However it will inte= rnally
> mark the datanodes as dead after 10:30 minutes (even if in the gui you= have
> 'no answer for x minutes').
> HBase monitoring is done by ZooKeeper. By default, a regionserver is > considered as dead after 180s with no answer. Before, well, it's c= onsidered
> as live.
> When you stop a regionserver, it tries to flush its data to the disk (= i.e.
> hdfs, i.e. the datanodes). That's why if you have no datanodes, or= if a high
> ratio of your datanodes are dead, it can't shutdown. Connection re= fused &
> socket timeouts come from the fact that before the 10:30 minutes hdfs = does
> not declare the nodes as dead, so hbase tries to use them (and, obviou= sly,
> fails). Note that there is now =A0an intermediate state for hdfs datan= odes,
> called "stale": an intermediary state where the datanode is = used only if you
> have to (i.e. it's the only datanode with a block replica you need= ). It will
> be documented in HBase for the 0.96 release. But if all your datanodes= are
> down it won't change much.
>
> Cheers,
>
> Nicolas
>
>
>
> On Mon, Feb 25, 2013 at 10:10 AM, Davey Yan <davey.yan@gmail.com> wrote:
>>
>> Hey guys,
>>
>> We have a cluster with 5 nodes(1 NN and 4 DNs) running for more th= an 1
>> year, and it works fine.
>> But the datanodes got shutdown twice in the last month.
>>
>> When the datanodes got shutdown, all of them became "Dead Nod= es" in
>> the NN web admin UI(http://ip:50070/dfshealth.jsp),
>> but regionservers of HBase were still live in the HBase web
>> admin(= http://ip:60010/master-status), of course, they were zombies.
>> All of the processes of jvm were still running, including
>> hmaster/namenode/regionserver/datanode.
>>
>> When the datanodes got shutdown, the load (using the "top&quo= t; command) of
>> slaves became very high, more than 10, higher than normal running.=
>> From the "top" command, we saw that the processes of dat= anode and
>> regionserver were comsuming CPU.
>>
>> We could not stop the HBase or Hadoop cluster through normal
>> commands(stop-*.sh/*-daemon.sh stop *).
>> So we stopped datanodes and regionservers by kill -9 PID, then the=
>> load of slaves returned to normal level, and we start the cluster<= br> >> again.
>>
>>
>> Log of NN at the shutdown point(All of the DNs were removed):
>> 2013-02-22 11:10:02,278 INFO org.apache.hadoop.net.NetworkTopology= :
>> Removing a node: /default-rack/192.168.1.152:50010
>> 2013-02-22 11:10:02,278 INFO org.apache.hadoop.hdfs.StateChange: >> BLOCK* NameSystem.heartbeatCheck: lost heartbeat from
>> 192.168.1= .149:50010
>> 2013-02-22 11:10:02,693 INFO org.apache.hadoop.net.NetworkTopology= :
>> Removing a node: /default-rack/192.168.1.149:50010
>> 2013-02-22 11:10:02,693 INFO org.apache.hadoop.hdfs.StateChange: >> BLOCK* NameSystem.heartbeatCheck: lost heartbeat from
>> 192.168.1= .150:50010
>> 2013-02-22 11:10:03,004 INFO org.apache.hadoop.net.NetworkTopology= :
>> Removing a node: /default-rack/192.168.1.150:50010
>> 2013-02-22 11:10:03,004 INFO org.apache.hadoop.hdfs.StateChange: >> BLOCK* NameSystem.heartbeatCheck: lost heartbeat from
>> 192.168.1= .148:50010
>> 2013-02-22 11:10:03,339 INFO org.apache.hadoop.net.NetworkTopology= :
>> Removing a node: /default-rack/192.168.1.148:50010
>>
>>
>> Logs in DNs indicated there were many IOException and
>> SocketTimeoutException:
>> 2013-02-22 11:02:52,354 ERROR
>> org.apache.hadoop.hdfs.server.datanode.DataNode:
>> DatanodeRegistration(192.168.1.148:50010,
>> storageID=3DDS-970284113-117.25.149.160-50010-1328074119937,
>> infoPort=3D50075, ipcPort=3D50020):DataXceiver
>> java.io.IOException: Interrupted receiveBlock
>> =A0 =A0 =A0 =A0 at
>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(= BlockReceiver.java:577)
>> =A0 =A0 =A0 =A0 at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(Data= Xceiver.java:398)
>> =A0 =A0 =A0 =A0 at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver= .java:107)
>> =A0 =A0 =A0 =A0 at java.lang.Thread.run(Thread.java:662)
>> 2013-02-22 11:03:44,823 WARN
>> org.apache.hadoop.hdfs.server.datanode.DataNode:
>> DatanodeRegistration(192.168.1.148:50010,
>> storageID=3DDS-970284113-117.25.149.160-50010-1328074119937,
>> infoPort=3D50075, ipcPort=3D50020):Got exception while serving
>> blk_-1985405101514576650_247001 to /192.168.1.148:
>> java.net.SocketTimeoutException: 480000 millis timeout while waiti= ng
>> for channel to be ready for write. ch :
>> java.nio.channels.SocketChannel[connected local=3D/192.168.1.148:50010
>> remote=3D/192.168.1.148:48654]
>> =A0 =A0 =A0 =A0 at
>> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTi= meout.java:246)
>> =A0 =A0 =A0 =A0 at
>> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOut= putStream.java:159)
>> =A0 =A0 =A0 =A0 at
>> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOut= putStream.java:198)
>> =A0 =A0 =A0 =A0 at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(Bloc= kSender.java:350)
>> =A0 =A0 =A0 =A0 at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(Block= Sender.java:436)
>> =A0 =A0 =A0 =A0 at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataX= ceiver.java:197)
>> =A0 =A0 =A0 =A0 at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver= .java:99)
>> =A0 =A0 =A0 =A0 at java.lang.Thread.run(Thread.java:662)
>> 2013-02-22 11:09:42,294 ERROR
>> org.apache.hadoop.hdfs.server.datanode.DataNode:
>> DatanodeRegistration(192.168.1.148:50010,
>> storageID=3DDS-970284113-117.25.149.160-50010-1328074119937,
>> infoPort=3D50075, ipcPort=3D50020):DataXceiver
>> java.net.SocketTimeoutException: 480000 millis timeout while waiti= ng
>> for channel to be ready for write. ch :
>> java.nio.channels.SocketChannel[connected local=3D/192.168.1.148:50010
>> remote=3D/192.168.1.148:37188]
>> =A0 =A0 =A0 =A0 at
>> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTi= meout.java:246)
>> =A0 =A0 =A0 =A0 at
>> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOut= putStream.java:159)
>> =A0 =A0 =A0 =A0 at
>> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOut= putStream.java:198)
>> =A0 =A0 =A0 =A0 at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(Bloc= kSender.java:350)
>> =A0 =A0 =A0 =A0 at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(Block= Sender.java:436)
>> =A0 =A0 =A0 =A0 at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataX= ceiver.java:197)
>> =A0 =A0 =A0 =A0 at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver= .java:99)
>> =A0 =A0 =A0 =A0 at java.lang.Thread.run(Thread.java:662)
>> 2013-02-22 11:12:41,892 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verificat= ion
>> succeeded for blk_-2674357249542194287_43419
>>
>>
>> Here is our env:
>> hadoop 1.0.3
>> hbase 0.94.1(snappy enabled)
>>
>> java version "1.6.0_31"
>> Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
>> Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)
>>
>> # ulimit -a
>> core file size =A0 =A0 =A0 =A0 =A0(blocks, -c) 0
>> data seg size =A0 =A0 =A0 =A0 =A0 (kbytes, -d) unlimited
>> scheduling priority =A0 =A0 =A0 =A0 =A0 =A0 (-e) 20
>> file size =A0 =A0 =A0 =A0 =A0 =A0 =A0 (blocks, -f) unlimited
>> pending signals =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (-i) 16382
>> max locked memory =A0 =A0 =A0 (kbytes, -l) 64
>> max memory size =A0 =A0 =A0 =A0 (kbytes, -m) unlimited
>> open files =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(-n) 32768 >> pipe size =A0 =A0 =A0 =A0 =A0 =A0(512 bytes, -p) 8
>> POSIX message queues =A0 =A0 (bytes, -q) 819200
>> real-time priority =A0 =A0 =A0 =A0 =A0 =A0 =A0(-r) 0
>> stack size =A0 =A0 =A0 =A0 =A0 =A0 =A0(kbytes, -s) 8192
>> cpu time =A0 =A0 =A0 =A0 =A0 =A0 =A0 (seconds, -t) unlimited
>> max user processes =A0 =A0 =A0 =A0 =A0 =A0 =A0(-u) 32768
>> virtual memory =A0 =A0 =A0 =A0 =A0(kbytes, -v) unlimited
>> file locks =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(-x) unlimit= ed
>>
>> # uname -a
>> Linux ubuntu6401 2.6.32-33-server #70-Ubuntu SMP Thu Jul 7 22:28:3= 0
>> UTC 2011 x86_64 GNU/Linux
>>
>>
>> # free(master)
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0total =A0 =A0 =A0 used =A0 =A0 =A0 free= =A0 =A0 shared =A0 =A0buffers =A0 =A0 cached
>> Mem: =A0 =A0 =A024732936 =A0 =A08383708 =A0 16349228 =A0 =A0 =A0 = =A0 =A00 =A0 =A0 490584 =A0 =A02580356
>> -/+ buffers/cache: =A0 =A05312768 =A0 19420168
>> Swap: =A0 =A0 72458232 =A0 =A0 =A0 =A0 =A00 =A0 72458232
>>
>>
>> # free(slaves)
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0total =A0 =A0 =A0 used =A0 =A0 =A0 free= =A0 =A0 shared =A0 =A0buffers =A0 =A0 cached
>> Mem: =A0 =A0 =A024733000 =A0 22824276 =A0 =A01908724 =A0 =A0 =A0 = =A0 =A00 =A0 =A0 862556 =A0 15303304
>> -/+ buffers/cache: =A0 =A06658416 =A0 18074584
>> Swap: =A0 =A0 72458232 =A0 =A0 =A0 =A0264 =A0 72457968
>>
>>
>> Some important conf:
>> core-site.xml
>> =A0 =A0 =A0 =A0 <property>
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 <name>io.file.buffer.size<= ;/name>
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 <value>65536</value> >> =A0 =A0 =A0 =A0 </property>
>>
>> hdfs-site.xml
>> =A0 =A0 =A0 =A0 <property>
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 <name>dfs.block.size</nam= e>
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 <value>134217728</value&g= t;
>> =A0 =A0 =A0 =A0 </property>
>> =A0 =A0 =A0 =A0 <property>
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 <name>dfs.datanode.max.xciev= ers</name>
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 <value>4096</value> >> =A0 =A0 =A0 =A0 </property>
>> =A0 =A0 =A0 =A0 <property>
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 <name>dfs.support.append<= /name>
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 <value>true</value> >> =A0 =A0 =A0 =A0 </property>
>> =A0 =A0 =A0 =A0 <property>
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 <name>dfs.replication</na= me>
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 <value>2</value>
>> =A0 =A0 =A0 =A0 </property>
>>
>>
>> Hope you can help us.
>> Thanks in advance.
>>
>>
>>
>> --
>> Davey Yan
>
>



--
Davey Yan

--bcaec5555040846ff604d689b67a--