commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Krugler <>
Subject Challenges using FTPClient with Hadoop distcp
Date Wed, 18 Jan 2012 01:58:27 GMT
Hi all,

We're trying to use the (pretty much never used) FTPFileSystem in Hadoop, to send data from
a Hadoop cluster to an FTP server.

The first challenge is that the FTPFileSystem configures FTPClient to run in active mode,
which didn't work with our FTP server/firewall configuration.

So we created a PassiveFTPFileSystem that uses passive mode.

This is able to connect to the FTP server, and is able to send some files - but ultimately
this copying always fails.

On the server side we see nothing in the logs (it's using vsftp 2.2.2), even with debug logging

On the Hadoop (client) side, we see a mix of errors in the logs. Most look like... Connection closed without indication.

I'm wondering if there's any issue running 14 parallel FTPClient sessions from a single server
- e.g. collisions in port numbers, though from my reading of the code that doesn't seem possible.

Thanks for any input.


-- Ken

Ken Krugler
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message