hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sam liu <samliuhad...@gmail.com>
Subject Re: Failed to run distcp against ftp server installed on Windows.
Date Wed, 29 Apr 2015 02:35:34 GMT
for IIS ftp server on Windows, seems the distcp tool always failed on the
line 'client.setFileTransferMode(FTP.BLOCK_TRANSFER_MODE)' in
hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ftp/FTPFileSystem.java#connect()

Opened a jira for this issue: HADOOP-11886

2015-04-27 16:36 GMT+08:00 sam liu <samliuhadoop@gmail.com>:

> Hi Experts,
>
> It is really weird that DistCp could successfully get the file from
> FileZilla ftp server on Windows7, but failed from the IIS ftp server on the
> same Windows7 OS(but I can get file using wget directly: 'wget
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt' ). I tried several
> times, but all failed and encountered different error messages as below.
>
> Any comments?
>
> *[Success on FileZilla ftp server on Windows7]:*
> [hdfs@hostname2.com ~]$ hadoop distcp
> ftp://ftp:ftp@hostname1.com:121/ftp_test.txt /tmp/
> 15/04/26 22:56:20 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://ftp:ftp@hostname1.com:121/ftp_test.txt], targetPath=/tmp,
> targetPathExists=true, preserveRawXattrs=false}
> 15/04/26 22:56:21 INFO impl.TimelineClientImpl: Timeline service address:
> http://hostname2.com:8188/ws/v1/timeline/
> 15/04/26 22:56:21 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8050
> 15/04/26 22:56:43 INFO impl.TimelineClientImpl: Timeline service address:
> http://hostname2.com:8188/ws/v1/timeline/
> 15/04/26 22:56:43 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8050
> 15/04/26 22:56:43 INFO mapreduce.JobSubmitter: number of splits:1
> 15/04/26 22:56:44 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_1429858372957_0002
> 15/04/26 22:56:44 INFO impl.YarnClientImpl: Submitted application
> application_1429858372957_0002
> 15/04/26 22:56:44 INFO mapreduce.Job: The url to track the job:
> http://hostname2.com:8088/proxy/application_1429858372957_0002/
> 15/04/26 22:56:44 INFO tools.DistCp: DistCp job-id: job_1429858372957_0002
> 15/04/26 22:56:44 INFO mapreduce.Job: Running job: job_1429858372957_0002
> 15/04/26 22:56:51 INFO mapreduce.Job: Job job_1429858372957_0002 running
> in uber mode : false
> 15/04/26 22:56:51 INFO mapreduce.Job:  map 0% reduce 0%
>
> *[Failure 1 on  IIS ftp server on the same Windows7 OS] :*
> [hdfs@hostname2.com ~]$ hadoop distcp
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
> 15/04/27 00:02:45 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
> targetPathExists=true, preserveRawXattrs=false}
> 15/04/27 00:02:47 INFO impl.TimelineClientImpl: Timeline service address:
> http://hostname2.com:8188/ws/v1/timeline/
> 15/04/27 00:02:47 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8050
> 15/04/27 00:03:50 ERROR tools.DistCp: Invalid input:
> org.apache.hadoop.tools.CopyListing$InvalidInputException:
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt doesn't exist
>         at
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:84)
>         at
> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
>         at
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
>
> *[Failure 2 on  IIS ftp server on the same Windows7 OS] :*
> [biadmin@hostname2.com ~]$ hadoop distcp
> ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt /tmp/
> 15/02/01 23:03:37 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt], targetPath=/tmp,
> targetPathExists=true}
> 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8032
> 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
> org.apache.commons.net.ftp.FTPConnectionClosedException: Connection closed
> without indication.
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>         at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
>         at
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>         at
> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
>         at
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
>
> *[Failure 3 on  IIS ftp server on the same Windows7 OS] :*
> [hdfs@hostname2.com ~]$ hadoop distcp
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt /tmp/
> 15/04/27 00:08:18 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
> ftp://Viewer:passw0rd@hostname1.com:21/ftp_file1.txt], targetPath=/tmp,
> targetPathExists=true, preserveRawXattrs=false}
> 15/04/27 00:08:19 INFO impl.TimelineClientImpl: Timeline service address:
> http://hostname2.com:8188/ws/v1/timeline/
> 15/04/27 00:08:19 INFO client.RMProxy: Connecting to ResourceManager at
> hostname2.com/9.32.249.181:8050
> 15/04/27 00:10:29 ERROR tools.DistCp: Exception encountered
> java.net.SocketException: Connection reset
>         at java.net.SocketInputStream.read(SocketInputStream.java:196)
>         at java.net.SocketInputStream.read(SocketInputStream.java:122)
>         at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
>         at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
>         at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
>         at java.io.InputStreamReader.read(InputStreamReader.java:184)
>         at java.io.BufferedReader.fill(BufferedReader.java:154)
>         at java.io.BufferedReader.read(BufferedReader.java:175)
>         at
> org.apache.commons.net.io.CRLFLineReader.readLine(CRLFLineReader.java:58)
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:310)
>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:162)
>         at
> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:410)
>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>         at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
>         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1625)
>         at
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>         at
> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
>         at
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
>
> Thanks!
>
>
> 2015-02-02 15:41 GMT+08:00 sam liu <samliuhadoop@gmail.com>:
>
>> Hi Experts,
>>
>> I could run distcp against ftp server installed on Linux, but could NOT
>> run distcp against ftp server installed on Windows. Below are the steps.
>>
>> Is this a DistCp bug? Any comments?
>>
>> [Scenario 1]
>> I installed a BI cluster using trunk build on HadoopNode1, and then could
>> copy file from a ftp installed on Linux to hdfs using command:
>> hadoop distcp ftp://user1:user1@9.185.68.201/home/user1/ftp.txt
>> hdfs://HadoopNode1:9000/tmp/
>>
>> [Scenario 2]
>> On the same hadoop node, I can copy file from a remote ftp server
>> installed on Windows7 using command:
>> wget ftp://Viewer:password1@9.126.148.79/ftp-win.txt.
>>
>> But I failed to copy file from a ftp installed on Windows7 to hdfs using
>> command:
>> [user1@HadoopNode1 ~]$ hadoop distcp
>> ftp://Viewer:password1@9.126.148.79/ftp-win.txt /tmp/
>> 15/02/01 23:03:37 INFO tools.DistCp: Input Options:
>> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
>> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
>> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[
>> ftp://Viewer:password1@9.126.148.79/ftp-win.txt], targetPath=/tmp,
>> targetPathExists=true}
>> 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at
>> HadoopNode1/9.30.239.166:8032
>> 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered
>> org.apache.commons.net.ftp.FTPConnectionClosedException: Connection
>> closed without indication.
>>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313)
>>         at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552)
>>         at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601)
>>         at org.apache.commons.net.ftp.FTP.quit(FTP.java:809)
>>         at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979)
>>         at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151)
>>         at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395)
>>         at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>>         at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>>         at
>> org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632)
>>         at
>> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>>         at
>> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)
>>         at
>> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>>         at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>>         at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>         at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
>>
>> Thanks!
>>
>
>

Mime
View raw message