Mailing-List: contact hdfs-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-dev@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of todd@cloudera.com designates
 209.85.161.176 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <1542FA4EE20C5048A5C2A3663BED2A6B516355@szxeml504-mbx.china.huawei.com>
References: 
 <1542FA4EE20C5048A5C2A3663BED2A6B515D7D@szxeml504-mbx.china.huawei.com>
 <CADY20s48Do=wh+p4qMmD_YwMa5-uNg2Y0Uf+Ruk9J9ap58csMg@mail.gmail.com>
 <1542FA4EE20C5048A5C2A3663BED2A6B516355@szxeml504-mbx.china.huawei.com>
From: Todd Lipcon <todd@cloudera.com>
Date: Wed, 23 Nov 2011 00:38:42 -0800
Message-ID: 
 <CADY20s7q=nxTcjJHsia1MqqunNOGtqbZZq6=xD1-nw7YmsBOxg@mail.gmail.com>
Subject: Re: Blocks are getting corrupted under very high load
To: hdfs-dev@hadoop.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

I noticed that the reported block after restart is also much smaller
than the block reported earlier.

Any chance when your DN restarted it actually lost power? ie do you
mean that just the DN JVM restarted, or that the whole machine crashed
and restarted?

On Tue, Nov 22, 2011 at 9:17 PM, Uma Maheswara Rao G
<maheswara@huawei.com> wrote:
> Hi Todd,
> Thanks a lot for taking a look.
>
> Yes, I also suspect the same initially. But after analysing the logs, we =
found below client trace logs in DN, which means block finalization complet=
ed.
> ./hadoop-root-datanode-xx-xx-132-22.log.1:2011-11-20 18:18:45,498 INFO =
=A0DataNode.clienttrace (BlockReceiver.java:run(1130)) - src: /xx.xx.132.26=
:55882, dest: /xx.xx.132.22:10010, bytes: 255954944, op: HDFS_WRITE, cliID:=
 DFSClient_NONMAPREDUCE_827638122_13, srvID: DS-1518903564-158.1.132.22-100=
10-1321492867433, blockid: blk_1321803251510_85379
> ./hadoop-root-datanode-xx-xx-132-22.log.1:2011-11-20 18:18:45,498 INFO =
=A0datanode.DataNode (BlockReceiver.java:run(1185)) - PacketResponder 0 for=
 block blk_1321803251510_85379 terminating
>
>
> blk_1321803251510_85379 is recent generationTimeStamp. Also NN logs clear=
ly saying that addStoredBlock called for this block id.
>
> Below are the logs..
> ./hadoop-root-HANameNode-xx-xx-132-27.log.9:2011-11-20 18:18:15,836 INFO =
=A0namenode.FSNamesystem (FSNamesystem.java:commitBlockSynchronization(2415=
)) - commitBlockSynchronization(lastblock=3Dblk_1321803251510_83627, newgen=
erationstamp=3D85379, newlength=3D246505984, newtargets=3D[xx.xx.132.22:100=
10], closeFile=3Dfalse, deleteBlock=3Dfalse)
> ./hadoop-root-HANameNode-xx-xx-132-27.log.9:2011-11-20 18:18:15,869 INFO =
=A0namenode.FSNamesystem (FSNamesystem.java:commitBlockSynchronization(2488=
)) - commitBlockSynchronization(blk_1321803251510_85379) successful
> ./hadoop-root-HANameNode-xx-xx-132-27.log.9:2011-11-20 18:18:45,496 WARN =
=A0namenode.FSNamesystem (FSNamesystem.java:addStoredBlock(3708)) - Inconsi=
stent size for block blk_1321803251510_85379 reported from xx.xx.132.22:100=
10 current size is 246505984 reported size is 255954944
> ./hadoop-root-HANameNode-xx-xx-132-27.log.9:2011-11-20 18:18:45,496 WARN =
=A0hdfs.StateChange (FSNamesystem.java:addStoredBlock(3800)) - BLOCK* NameS=
ystem.addStoredBlock: Redundant addStoredBlock request received for blk_132=
1803251510_85379 on xx.xx.132.22:10010 size 255954944
>
>
> After DN restart:
> ./hadoop-root-HANameNode-xx-xx-132-27.log.7:2011-11-20 18:55:54,844 INFO =
=A0hdfs.StateChange (FSNamesystem.java:rejectAddStoredBlock(3520)) - BLOCK*=
 NameSystem.addStoredBlock: addStoredBlock request received for blk_1321803=
251510_83627 on xx.xx.132.22:10010 size 104428544 but was rejected: Reporte=
d as block being written but is a block of closed file.
>
> Regards,
> Uma
>
> ________________________________________
> From: Todd Lipcon [todd@cloudera.com]
> Sent: Wednesday, November 23, 2011 6:27 AM
> To: common-dev@hadoop.apache.org
> Cc: hdfs-dev@hadoop.apache.org
> Subject: Re: Blocks are getting corrupted under very high load
>
> Can you look on the DN in question and see whether it was succesfully
> finalized when the write finished? It doesn't sound like a successful
> write -- should have moved it out of the bbw directory into current/
>
> -Todd
>
> On Tue, Nov 22, 2011 at 3:16 AM, Uma Maheswara Rao G
> <maheswara@huawei.com> wrote:
>> Hi All,
>>
>>
>>
>> I have backported HDFS-1779 to our Hadoop version which is based on 0.20=
-Append branch.
>>
>> We are running a load test, as usual. (We want to ensure the reliability=
 of the system under heavy loads.)
>> My cluster has 8 DataNodes and a Namenode
>> Each machine has 16 CPUs and 12 hard disks, each having 2TB capacity.
>> Clients are running along with Datanodes.
>> Clients will upload some tar files containing 3-4 blocks, from 50 thread=
s.
>> Each block size is 256MB. replication factor is 3.
>>
>> Everything looks to be fine on a normal load.
>> When the load is increased, lot of errors are happening.
>> Many pipeline failures are happening also.
>> All these are fine, except for the strange case of few blocks.
>>
>> Some blocks (around 30) are missing (FSCK report shows).
>> When I tried to read that files, it fails saying that No Datanodes for t=
his block
>> Analysing the logs, we found that, for these blocks, pipeline recovery h=
appened, write was successful to a single Datanode.
>> Also, Datanode reported the block to Namenode in a blockReceived command=
.
>> After some time (say, 30 minutes), the Datanode is getting restarted.
>> In the BBW (BlocksBeingWritten) report send by DN immediately after rest=
art, these finalized blocks are also included. (Showing that these blocks a=
re in blocksBeingWritten folder)
>> In many of the cases, the generation timestamp reported in the BBW repor=
t is the old timestamp.
>>
>> Namenode is rejecting that block in the BBW report by saying file is alr=
eady closed.
>> Also, Namenode asks the Datanode to invlidate the blocks & Datanode is d=
oing the same.
>> When deleting the blocks also, it is printing the path from BlocksBeingW=
ritten directory. (Also the previous generation timestamp)
>>
>> Looks very strange for me.
>> Does this means that the finalized block file & meta file (which is writ=
ten in current folder) is getting lost after DN restart
>> Due to which Namenode will not receive these block's information in the =
BLOCK REPORT send from the Datanodes.
>>
>>
>>
>>
>>
>> Regards,
>>
>> Uma
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>


--=20
Todd Lipcon
Software Engineer, Cloudera