Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CBCCF19CBF for ; Wed, 23 Mar 2016 18:34:31 +0000 (UTC) Received: (qmail 8498 invoked by uid 500); 23 Mar 2016 18:34:25 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 8389 invoked by uid 500); 23 Mar 2016 18:34:25 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 8370 invoked by uid 99); 23 Mar 2016 18:34:25 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Mar 2016 18:34:25 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id C1075C059B for ; Wed, 23 Mar 2016 18:34:24 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.821 X-Spam-Level: X-Spam-Status: No, score=-0.821 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id KBWzFWhWFDbr for ; Wed, 23 Mar 2016 18:34:20 +0000 (UTC) Received: from mail-wm0-f44.google.com (mail-wm0-f44.google.com [74.125.82.44]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 49E3B5F239 for ; Wed, 23 Mar 2016 18:34:20 +0000 (UTC) Received: by mail-wm0-f44.google.com with SMTP id l68so207909657wml.0 for ; Wed, 23 Mar 2016 11:34:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=m360ITmn3FDhXtsHVjHB/8Kw0HJDKVcIFifSVcFWB5U=; b=nnpQru4uYnSc3oRWW+IG0oO4gjBc5jWEtDi7UcA+H4L/vkeAf1afNBwDQTwymbXTAU nDkiVjbmHGKvn3aAS+bnCmB+OM0cK7R10m2E1FqzqVS8ZGpt24q/exLDoxZKuPFcXyxN WH1g3utt7KxOQNI2YdyXILcu6dgqHqkQRDlTTYUjxbrOXuQ0TwrwNVQ0Ds3WJsblcTR5 vwW1cAw9mSbAgo4/nqEUWq6nMbdlWPoswzB4IreS4/vNlEeSqCMPnfNDmk1rLX6WHZov 69uL/B0PAz52MP+aCj8oCfLhn1siAQRcThSa1kiOLV8L02bfG+6+ILHX4EpYP9Fklfec bdtg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=m360ITmn3FDhXtsHVjHB/8Kw0HJDKVcIFifSVcFWB5U=; b=HKKr0y4rlPsnKYv77ylTCopFO4yvs2bXB+lHmf5s+LS4BQHGRdInpSEKbQXE6ZOps4 AJC+grrX6J1PIC0HCHDWgNNOG1bbWlWYd7r6JS2VK85Mpkfg42a68QhZny+A5smiSjXr 2XEdh7QAoP9CVClmXHAKDy0COvU3qCq6nKfZA/hP3GqkBPzZbLHIKQJB0E1CInLVlgip H/3rQdwKSSpJ/9F9QH01Y1duJqPXz3OYmUj9GALpILhXVmCQSAktUt2tenHvktILJ75h JfgBveSNfKlZKOBMboE7VFA2kvsRTiIbSb6vlHbO1xCmpS566ust0cPMOhfUvuPwyKeM vpcA== X-Gm-Message-State: AD7BkJI0RVjasUzC3KhB1CsZjDut06eLRpUwKbehSn+DskiQkYUrzkJ1ZuCegVzNHrj6LUr7RnhsnvXNiEfINw== X-Received: by 10.28.129.7 with SMTP id c7mr26563371wmd.11.1458758059021; Wed, 23 Mar 2016 11:34:19 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.170.195 with HTTP; Wed, 23 Mar 2016 11:33:49 -0700 (PDT) In-Reply-To: References: From: Namikaze Minato Date: Wed, 23 Mar 2016 19:33:49 +0100 Message-ID: Subject: Re: distcp failures and errors To: Colin Kincaid Williams Cc: "common-user@hadoop.apache.org" Content-Type: text/plain; charset=UTF-8 Hello. This does not copy file onto your local filesystem, but on the local HDFS filesystem. That was to be sure hdfs://cdh5dest-cluster/ was the correctly configured destination, but you confirmed that already. I have unfortunately no other pointers for now. When I stumbled on the same error as you, I switched from hftp:// to hdfs:// (because I was copying from cdh5 to cdh5, I was able to do that). Sorry for not being able to help more, I'll tell you if I find something else that could fix your issue. Regards, LLoyd On 23 March 2016 at 19:30, Colin Kincaid Williams wrote: >> Which cluster are you issuing the command on? > > The destination cluster (CDH5) > > The command I tried to run is documented above: > > hadoop distcp -D mapreduce.job.queuename=search -D > mapreduce.job.maxtaskfailures.per.tracker=1 -pb > hftp://cdh4source-cluster:50070/backups/HbaseTableCopy > hdfs://cdh5dest-cluster/user/colin.williams/hbase/ > > which includes the -pb flag. > >> Can you try this command please? > > hadoop distcp -pb -D mapreduce.job.queuename=search -D > mapreduce.job.maxtaskfailures.per.tracker=1 > hftp://cdh4source-cluster:50070/backups/HbaseTableCopy > /user/colin.williams/hbase/ > > I see no differnece in your command except for the hdfs:// path > missing. I don't want to copy massive files into my local filesystem. > The other suggestions you mailed me privately were irrelevant. > >> Uhm, are you sure you should specify port 50070 in the source? I may be talking for nothing here, but that seems strange to me. > > Regards, > LLoyd > >> Otherwise, you could try and stop the hbase service while you're doing your copy. This would avoid having the source modified by it while you're copying the file. It may not change anything, but trying should help understand if that's the issue or not. > > On Wed, Mar 23, 2016 at 3:04 PM, Namikaze Minato wrote: >> Which cluster are you issuing the command on? >> This command: >> hadoop distcp -D mapreduce.job.queuename=search -D >> mapreduce.job.maxtaskfailures.per.tracker=1 -pb >> hftp://cdh4source-cluster:50070/backups/HbaseTableCopy >> hdfs://cdh5dest-cluster/user/colin.williams/hbase/ >> >> The checksum issue is clearly linked to the "-pb" missing in that run. >> For the EOF error, I don't know yet. >> >> Can you try this command please? >> >> hadoop distcp -pb -D mapreduce.job.queuename=search -D >> mapreduce.job.maxtaskfailures.per.tracker=1 >> hftp://cdh4source-cluster:50070/backups/HbaseTableCopy >> /user/colin.williams/hbase/ >> >> Regards, >> LLoyd >> >> On 22 March 2016 at 21:38, Colin Kincaid Williams wrote: >>> >>> So far I'm on another Hadoop wild goose chase. I made another attempt >>> this time with the -Ddfs.checksum.type=CRC32 option set. I had a look >>> at the hdfs data node logs on both the cdh5 receiving cluster >>> datanode, and the cdh4 source cluster datanode. >>> >>> Here are the logs from the cdh5 datanode: >>> >>> 2016-03-21 01:40:21,719 ERROR >>> org.apache.hadoop.hdfs.server.datanode.DataNode: >>> us3sm2hb027r09.comp.prod.local:50010:DataXceiver error processing >>> WRITE_BLOCK operation src: /10.51.28.155:40297 dst: >>> /10.51.28.172:50010 >>> >>> java.io.IOException: Premature EOF from inputStream >>> >>> at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194) >>> >>> at >>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) >>> >>> at >>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) >>> >>> at >>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) >>> >>> at >>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:468) >>> >>> at >>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:772) >>> >>> at >>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:724) >>> >>> at >>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:126) >>> >>> at >>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:72) >>> >>> at >>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226) >>> >>> at java.lang.Thread.run(Thread.java:745) >>> >>> getNumBytes() = 218078234 >>> >>> getBytesOnDisk() = 218078234 >>> >>> getVisibleLength()= 218078234 >>> >>> getVolume() = /data8/dfs/current >>> >>> getBlockFile() = >>> >>> /data8/dfs/current/BP-1256332750-10.51.28.140-1408661299811/current/rbw/blk_1123289423 >>> >>> bytesAcked=218078234 >>> >>> bytesOnDisk=218078234 >>> >>> getNumBytes() = 218078234 >>> >>> getBytesOnDisk() = 218078234 >>> >>> getVisibleLength()= 218078234 >>> >>> getVolume() = /data8/dfs/current >>> >>> getBlockFile() = >>> >>> /data8/dfs/current/BP-1256332750-10.51.28.140-1408661299811/current/rbw/blk_1123289423 >>> >>> recoveryId=49653218 >>> >>> original=ReplicaBeingWritten, blk_1123289423_49566579, RBW >>> >>> getNumBytes() = 218078234 >>> >>> getBytesOnDisk() = 218078234 >>> >>> getVisibleLength()= 218078234 >>> >>> getVolume() = /data8/dfs/current >>> >>> getBlockFile() = >>> >>> /data8/dfs/current/BP-1256332750-10.51.28.140-1408661299811/current/rbw/blk_1123289423 >>> >>> bytesAcked=218078234 >>> >>> bytesOnDisk=218078234 >>> >>> Then I connect to the cdh4 datanode and looked at it's corresponding logs >>> : >>> >>> 2016-03-21 01:40:20,194 INFO >>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: >>> /10.51.28.171:39911, dest: /10.51.28.155:50010, bytes: 6546, op: >>> HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-1383949982_1, offset: 0, >>> srvID: 25cc228e-1f4f-4eae-9c70-caad9b24b95b, blockid: >>> BP-1256332750-10.51.28.140-1408661299811:blk_1123360423_49637579, >>> duration: 2283093 >>> 2016-03-21 01:40:20,194 INFO >>> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: >>> BP-1256332750-10.51.28.140-1408661299811:blk_1123360423_49637579, >>> type=HAS_DOWNSTREAM_IN_PIPELINE terminating >>> 2016-03-21 01:40:21,718 INFO >>> org.apache.hadoop.hdfs.server.datanode.DataNode: Exception for >>> BP-1256332750-10.51.28.140-1408661299811:blk_1123359613_49636769 >>> java.io.IOException: Premature EOF from inputStream >>> at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194) >>> at >>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) >>> at >>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) >>> at >>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) >>> at >>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:468) >>> at >>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:772) >>> at >>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:724) >>> at >>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:126) >>> at >>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:72) >>> at >>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226) >>> at java.lang.Thread.run(Thread.java:745) >>> 2016-03-21 01:40:21,718 INFO >>> org.apache.hadoop.hdfs.server.datanode.DataNode: Exception for >>> BP-1256332750-10.51.28.140-1408661299811:blk_1123289423_49566579 >>> java.io.IOException: Premature EOF from inputStream >>> at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194) >>> at >>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) >>> at >>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) >>> at >>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) >>> at >>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:468) >>> at >>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:772) >>> at >>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:724) >>> at >>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:126) >>> at >>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:72) >>> at >>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226) >>> at java.lang.Thread.run(Thread.java:745) >>> 2016-03-21 01:40:21,718 INFO >>> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: >>> BP-1256332750-10.51.28.140-1408661299811:blk_1123359613_49636769, >>> type=HAS_DOWNSTREAM_IN_PIPELINE: Thread is interrupted. >>> 2016-03-21 01:40:21,718 INFO >>> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: >>> BP-1256332750-10.51.28.140-1408661299811:blk_1123359613_49636769, >>> type=HAS_DOWNSTREAM_IN_PIPELINE terminating >>> 2016-03-21 01:40:21,718 INFO >>> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: >>> BP-1256332750-10.51.28.140-1408661299811:blk_1123289423_49566579, >>> type=HAS_DOWNSTREAM_IN_PIPELINE: Thread is interrupted. >>> 2016-03-21 01:40:21,718 INFO >>> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: >>> BP-1256332750-10.51.28.140-1408661299811:blk_1123289423_49566579, >>> type=HAS_DOWNSTREAM_IN_PIPELINE terminating >>> 2016-03-21 01:40:21,718 INFO >>> org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock >>> BP-1256332750-10.51.28.140-1408661299811:blk_1123359613_49636769 >>> received exception java.io.IOException: Premature EOF from inputStream >>> 2016-03-21 01:40:21,718 INFO >>> org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock >>> BP-1256332750-10.51.28.140-1408661299811:blk_1123289423_49566579 >>> received exception java.io.IOException: Premature EOF from inputStream >>> 2016-03-21 01:40:21,718 ERROR >>> org.apache.hadoop.hdfs.server.datanode.DataNode: >>> us3sm2hb010r07.comp.prod.local:50010:DataXceiver error processing >>> WRITE_BLOCK operation src: /10.51.28.155:40392 dst: >>> /10.51.28.155:50010 >>> java.io.IOException: Premature EOF from inputStream >>> at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194) >>> at >>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) >>> at >>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) >>> at >>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) >>> at >>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:468) >>> at >>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:772) >>> at >>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:724) >>> at >>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:126) >>> at >>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:72) >>> at >>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226) >>> at java.lang.Thread.run(Thread.java:745) >>> 2016-03-21 01:40:21,718 ERROR >>> org.apache.hadoop.hdfs.server.datanode.DataNode: >>> us3sm2hb010r07.comp.prod.local:50010:DataXceiver error processing >>> WRITE_BLOCK operation src: /10.51.28.155:49016 dst: >>> /10.51.28.155:50010 >>> java.io.IOException: Premature EOF from inputStream >>> at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194) >>> at >>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) >>> at >>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) >>> at >>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) >>> at >>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:468) >>> at >>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:772) >>> at >>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:724) >>> at >>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:126) >>> at >>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:72) >>> at >>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226) >>> at java.lang.Thread.run(Thread.java:745) >>> 2016-03-21 01:40:28,272 INFO >>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving >>> BP-1256332750-10.51.28.140-1408661299811:blk_1123360428_49637584 src: >>> /10.51.28.155:40588 dest: /10.51.28.155:50010 >>> >>> >>> On Tue, Mar 22, 2016 at 1:11 AM, Colin Kincaid Williams >>> wrote: >>> > Almost forgot to include the final failure: >>> > >>> > >>> > 16/03/21 18:50:44 INFO mapreduce.Job: Job job_1453754997414_337405 >>> > failed with state FAILED due to: Task failed >>> > task_1453754997414_337405_m_000007 >>> > Job failed as tasks failed. failedMaps:1 failedReduces:0 >>> > >>> > 16/03/21 18:50:44 INFO mapreduce.Job: Counters: 9 >>> > Job Counters >>> > Failed map tasks=22 >>> > Killed map tasks=26 >>> > Launched map tasks=48 >>> > Other local map tasks=48 >>> > Total time spent by all maps in occupied slots >>> > (ms)=182578858 >>> > Total time spent by all reduces in occupied slots (ms)=0 >>> > Total time spent by all map tasks (ms)=182578858 >>> > Total vcore-seconds taken by all map tasks=182578858 >>> > Total megabyte-seconds taken by all map >>> > tasks=186960750592 >>> > 16/03/21 18:50:44 ERROR tools.DistCp: Exception encountered >>> > java.io.IOException: DistCp failure: Job job_1453754997414_337405 has >>> > failed: Task failed task_1453754997414_337405_m_000007 >>> > Job failed as tasks failed. failedMaps:1 failedReduces:0 >>> > >>> > at org.apache.hadoop.tools.DistCp.execute(DistCp.java:175) >>> > at org.apache.hadoop.tools.DistCp.run(DistCp.java:121) >>> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) >>> > at org.apache.hadoop.tools.DistCp.main(DistCp.java:401) >>> > >>> > >>> > On Tue, Mar 22, 2016 at 12:58 AM, Colin Kincaid Williams >>> > wrote: >>> >> I'm trying to copy data between two clusters with >>> >> >>> >> hadoop version >>> >> Hadoop 2.0.0-cdh4.1.3 >>> >> Subversion >>> >> file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hadoop-2.0.0-cdh4.1.3/src/hadoop-common-project/hadoop-common >>> >> -r dbc7a60f9a798ef63afb7f5b723dc9c02d5321e1 >>> >> Compiled by jenkins on Sat Jan 26 16:46:14 PST 2013 >>> >> From source with checksum ad1ed6a3ede2e0e9c39b052bbc76c189 >>> >> >>> >> and >>> >> >>> >> hadoop version >>> >> Hadoop 2.5.0-cdh5.3.0 >>> >> Subversion http://github.com/cloudera/hadoop -r >>> >> f19097cda2536da1df41ff6713556c8f7284174d >>> >> Compiled by jenkins on 2014-12-17T03:05Z >>> >> Compiled with protoc 2.5.0 >>> >> From source with checksum 9c4267e6915cf5bbd4c6e08be54d54e0 >>> >> This command was run using >>> >> /usr/lib/hadoop/hadoop-common-2.5.0-cdh5.3.0.jar >>> >> >>> >> The command I'm using to do so is: >>> >> >>> >> hadoop distcp -D mapreduce.job.queuename=search -D >>> >> mapreduce.job.maxtaskfailures.per.tracker=1 -pb >>> >> hftp://cdh4source-cluster:50070/backups/HbaseTableCopy >>> >> hdfs://cdh5dest-cluster/user/colin.williams/hbase/ >>> >> >>> >> I've also tried it without the -pb and -D >>> >> mapreduce.job.maxtaskfailures.per.tracker=1 options. All my attempts >>> >> fail, and the command prints out various errors during the attempts: >>> >> >>> >> Error: java.io.IOException: File copy failed: >>> >> hftp://cdh4source-cluster:50070/backups/HbaseTableCopy/part-m-00018 >>> >> --> >>> >> hdfs://cdh5dest-cluster/user/colin.williams/hbase/HbaseTableCopy/part-m-00018 >>> >> at >>> >> org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:284) >>> >> at >>> >> org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:252) >>> >> at >>> >> org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50) >>> >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) >>> >> at >>> >> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) >>> >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) >>> >> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) >>> >> at java.security.AccessController.doPrivileged(Native Method) >>> >> at javax.security.auth.Subject.doAs(Subject.java:415) >>> >> at >>> >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) >>> >> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) >>> >> Caused by: java.io.IOException: Couldn't run retriable-command: >>> >> Copying >>> >> hftp://cdh4source-cluster:50070/backups/HbaseTableCopy/part-m-00018 >>> >> to >>> >> hdfs://cdh5dest-cluster/user/colin.williams/hbase/HbaseTableCopy/part-m-00018 >>> >> at >>> >> org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101) >>> >> at >>> >> org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:280) >>> >> ... 10 more >>> >> Caused by: java.io.IOException: Check-sum mismatch between >>> >> hftp://cdh4source-cluster:50070/backups/HbaseTableCopy/part-m-00018 >>> >> and >>> >> hdfs://cdh5dest-cluster/user/colin.williams/hbase/.distcp.tmp.attempt_1453754997414_337405_m_000007_0. >>> >> at >>> >> org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.compareCheckSums(RetriableFileCopyCommand.java:211) >>> >> at >>> >> org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:131) >>> >> at >>> >> org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:100) >>> >> at >>> >> org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87) >>> >> ... 11 more >>> >> >>> >> OR >>> >> >>> >> 16/03/21 17:30:47 INFO mapreduce.Job: Task Id : >>> >> attempt_1453754997414_337405_m_000001_0, Status : FAILED >>> >> Error: java.io.IOException: File copy failed: >>> >> hftp://cdh4source-cluster:50070/backups/HbaseTableCopy/part-m-00004 >>> >> --> >>> >> hdfs://cdh5dest-cluster/user/colin.williams/hbase/HbaseTableCopy/part-m-00004 >>> >> at >>> >> org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:284) >>> >> at >>> >> org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:252) >>> >> at >>> >> org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50) >>> >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) >>> >> at >>> >> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) >>> >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) >>> >> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) >>> >> at java.security.AccessController.doPrivileged(Native Method) >>> >> at javax.security.auth.Subject.doAs(Subject.java:415) >>> >> at >>> >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) >>> >> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) >>> >> Caused by: java.io.IOException: Couldn't run retriable-command: >>> >> Copying >>> >> hftp://cdh4source-cluster:50070/backups/HbaseTableCopy/part-m-00004 >>> >> to >>> >> hdfs://cdh5dest-cluster/user/colin.williams/hbase/HbaseTableCopy/part-m-00004 >>> >> at >>> >> org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101) >>> >> at >>> >> org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:280) >>> >> ... 10 more >>> >> Caused by: >>> >> org.apache.hadoop.tools.mapred.RetriableFileCopyCommand$CopyReadException: >>> >> java.io.IOException: Got EOF but currentPos = 916783104 < filelength = >>> >> 21615406422 >>> >> at >>> >> org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.readBytes(RetriableFileCopyCommand.java:289) >>> >> at >>> >> org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyBytes(RetriableFileCopyCommand.java:257) >>> >> at >>> >> org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToFile(RetriableFileCopyCommand.java:184) >>> >> at >>> >> org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:124) >>> >> at >>> >> org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:100) >>> >> at >>> >> org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87) >>> >> ... 11 more >>> >> Caused by: java.io.IOException: Got EOF but currentPos = 916783104 < >>> >> filelength = 21615406422 >>> >> at >>> >> org.apache.hadoop.hdfs.web.ByteRangeInputStream.update(ByteRangeInputStream.java:173) >>> >> at >>> >> org.apache.hadoop.hdfs.web.ByteRangeInputStream.read(ByteRangeInputStream.java:188) >>> >> at java.io.DataInputStream.read(DataInputStream.java:100) >>> >> at >>> >> org.apache.hadoop.tools.util.ThrottledInputStream.read(ThrottledInputStream.java:80) >>> >> at >>> >> org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.readBytes(RetriableFileCopyCommand.java:284) >>> >> ... 16 more >>> >> >>> >> >>> >> Then I see a checksum issue and the EOF issue. I've also run hadoop >>> >> fsck on the source files, and it doesn't report any errors. I see many >>> >> Jira issues and questions regarding DistCP. Can I get some help with >>> >> this? >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org >>> For additional commands, e-mail: user-help@hadoop.apache.org >>> >> --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org For additional commands, e-mail: user-help@hadoop.apache.org