hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom White <...@cloudera.com>
Subject Re: hadoop to ftp files into hdfs
Date Tue, 03 Feb 2009 09:44:18 GMT
NLineInputFormat is ideal for this purpose. Each split will be N lines
of input (where N is configurable), so each mapper can retrieve N
files for insertion into HDFS. You can set the number of redcers to


On Tue, Feb 3, 2009 at 4:23 AM, jason hadoop <jason.hadoop@gmail.com> wrote:
> If you have a large number of ftp urls spread across many sites, simply set
> that file to be your hadoop job input, and force the input split to be a
> size that gives you good distribution across your cluster.
> On Mon, Feb 2, 2009 at 3:23 PM, Steve Morin <steve.morin@gmail.com> wrote:
>> Does any one have a good suggestion on how to submit a hadoop job that
>> will split the ftp retrieval of a number of files for insertion into
>> hdfs?  I have been searching google for suggestions on this matter.
>> Steve

View raw message