hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ankur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3246) FTP client over HDFS
Date Tue, 15 Apr 2008 05:35:04 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12588912#action_12588912
] 

Ankur commented on HADOOP-3246:
-------------------------------

Even though the two issues look same, they are different.

HADOOP-3199 is about an FTP server that provides FTP access to data in HDFS. Any FTP client
would then be able to access HDFS data through FTP.

This issue is about an FTP client talks to remote FTP server(s), pull data from them and store
directly into HDFS. 

At present we are faced with the issue of our data lying in different remote FTP server locations.
Pulling a lot of data from different locations is a lot of manual work including fetching
data over FTP, storing it locally and then putting it into HDFS. This is cumbersome especially
if the data is too large to fit into local storage.

This utility essentially provides following benefits
1. The steps of  'pull data from FTP server', 'store locally', 'tranfer to HDFS' and 'delete
local copy' are converted into 1 step - 'Pull data and store into HDFS' . 
2. No need to worry about lack of local storage as data goes directly into HDFS.
3. Can be used to run a batch of commands that include pulling data from different FTP servers.

All of this greatly simplifies administrative tasks.

+1 for marking this as 'Not Duplicate'

> FTP client over HDFS
> --------------------
>
>                 Key: HADOOP-3246
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3246
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: util
>            Reporter: Ankur
>            Priority: Minor
>
> An FTP client that stores content directly into HDFS allows data from FTP serves to be
stored directly into HDFS instead of first copying the data locally and then uploading it
into HDFS. The benefits are apparent from an administrative perspective as large datasets
can be pulled from FTP servers with minimal human intervention.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message