hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sam liu <samliuhad...@gmail.com>
Subject Re: Failed to run distcp against ftp server installed on Windows.
Date Mon, 27 Apr 2015 08:36:05 GMT
Hi Experts,

It is really weird that DistCp could successfully get the file from
FileZilla ftp server on Windows7, but failed from the IIS ftp server on the
same Windows7 OS(but I can get file using wget directly: 'wget
ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt' ). I tried several
times, but all failed and encountered different error messages as below.

Any comments?

*[Success on FileZilla ftp server on Windows7]:*
[hdfs@hostname2.com ~]$ hadoop distcp
ftp://ftp:ftp@hostname1.com:121/ftp_test.txt /tmp/
15/04/26 22:56:20 INFO tools.DistCp: Input Options:
DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
ftp://ftp:ftp@hostname1.com:121/ftp_test.txt], targetPath=/tmp,
targetPathExists=true, preserveRawXattrs=false}
15/04/26 22:56:21 INFO impl.TimelineClientImpl: Timeline service address:
http://hostname2.com:8188/ws/v1/timeline/
15/04/26 22:56:21 INFO client.RMProxy: Connecting to ResourceManager at
hostname2.com/9.32.249.181:8050
15/04/26 22:56:43 INFO impl.TimelineClientImpl: Timeline service address:
http://hostname2.com:8188/ws/v1/timeline/
15/04/26 22:56:43 INFO client.RMProxy: Connecting to ResourceManager at
hostname2.com/9.32.249.181:8050
15/04/26 22:56:43 INFO mapreduce.JobSubmitter: number of splits:1
15/04/26 22:56:44 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1429858372957_0002
15/04/26 22:56:44 INFO impl.YarnClientImpl: Submitted application
application_1429858372957_0002
15/04/26 22:56:44 INFO mapreduce.Job: The url to track the job:
http://hostname2.com:8088/proxy/application_1429858372957_0002/
15/04/26 22:56:44 INFO tools.DistCp: DistCp job-id: job_1429858372957_0002
15/04/26 22:56:44 INFO mapreduce.Job: Running job: job_1429858372957_0002
15/04/26 22:56:51 INFO mapreduce.Job: Job job_1429858372957_0002 running in
uber mode : false
15/04/26 22:56:51 INFO mapreduce.Job:  map 0% reduce 0%

*[Failure 1 on  IIS ftp server on the same Windows7 OS] :*
[hdfs@hostname2.com ~]$ hadoop distcp
ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
15/04/27 00:02:45 INFO tools.DistCp: Input Options:
DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
targetPathExists=true, preserveRawXattrs=false}
15/04/27 00:02:47 INFO impl.TimelineClientImpl: Timeline service address:
http://hostname2.com:8188/ws/v1/timeline/
15/04/27 00:02:47 INFO client.RMProxy: Connecting to ResourceManager at
hostname2.com/9.32.249.181:8050
15/04/27 00:03:50 ERROR tools.DistCp: Invalid input:
org.apache.hadoop.tools.CopyListing$InvalidInputException:
ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt doesn't exist
        at
org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:84)
        at
org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
        at
org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
        at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)

*[Failure 2 on  IIS ftp server on the same Windows7 OS] :*
[biadmin@hostname2.com ~]$ hadoop distcp
ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt /tmp/
15/02/01 23:03:37 INFO tools.DistCp: Input Options:
DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt], targetPath=/tmp,
targetPathExists=true}
15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
hostname2.com/9.32.249.181:8032
15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
org.apache.commons.net.ftp.FTPConnectionClosedException: Connection closed
without indication.
        at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
        at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
        at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
        at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
        at
org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
        at
org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
        at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
        at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
        at
org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
        at
org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
        at
org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
        at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)

*[Failure 3 on  IIS ftp server on the same Windows7 OS] :*
[hdfs@hostname2.com ~]$ hadoop distcp
ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
15/04/27 00:08:18 INFO tools.DistCp: Input Options:
DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
targetPathExists=true, preserveRawXattrs=false}
15/04/27 00:08:19 INFO impl.TimelineClientImpl: Timeline service address:
http://hostname2.com:8188/ws/v1/timeline/
15/04/27 00:08:19 INFO client.RMProxy: Connecting to ResourceManager at
hostname2.com/9.32.249.181:8050
15/04/27 00:10:29 ERROR tools.DistCp: Exception encountered
java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:196)
        at java.net.SocketInputStream.read(SocketInputStream.java:122)
        at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
        at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
        at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
        at java.io.InputStreamReader.read(InputStreamReader.java:184)
        at java.io.BufferedReader.fill(BufferedReader.java:154)
        at java.io.BufferedReader.read(BufferedReader.java:175)
        at
org.apache.commons.net.io.CRLFLineReader.readLine(CRLFLineReader.java:58)
        at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:310)
        at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
        at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
        at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
        at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
        at
org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:162)
        at
org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:410)
        at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
        at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1625)
        at
org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
        at
org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
        at
org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
        at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)

Thanks!


2015-02-02 15:41 GMT+08:00 sam liu <samliuhadoop@gmail.com>:

> Hi Experts,
>
> I could run distcp against ftp server installed on Linux, but could NOT
> run distcp against ftp server installed on Windows. Below are the steps.
>
> Is this a DistCp bug? Any comments?
>
> [Scenario 1]
> I installed a BI cluster using trunk build on HadoopNode1, and then could
> copy file from a ftp installed on Linux to hdfs using command:
> hadoop distcp ftp://user1:user1@9.185.68.201/home/user1/ftp.txt
> hdfs://HadoopNode1:9000/tmp/
>
> [Scenario 2]
> On the same hadoop node, I can copy file from a remote ftp server
> installed on Windows7 using command:
> wget ftp://Viewer:password1@9.126.148.79/ftp-win.txt.
>
> But I failed to copy file from a ftp installed on Windows7 to hdfs using
> command:
> [user1@HadoopNode1 ~]$ hadoop distcp
> ftp://Viewer:password1@9.126.148.79/ftp-win.txt /tmp/
> 15/02/01 23:03:37 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://Viewer:password1@9.126.148.79/ftp-win.txt], targetPath=/tmp,
> targetPathExists=true}
> 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
> HadoopNode1/9.30.239.166:8032
> 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
> org.apache.commons.net.ftp.FTPConnectionClosedException: Connection closed
> without indication.
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>         at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
>         at
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>         at
> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
>         at
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
>
> Thanks!
>

Mime
View raw message