hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zooni79 <zoon...@gmail.com>
Subject Re: Tweaking the File write in HDFS
Date Wed, 17 Nov 2010 13:51:23 GMT

As an extension to the problem statement...Is it possible to fuse step 1 and 2 in to one step?
i.e. Can we have the map task to pick the input from an external filesystem instead of HDFS.
Can FTPfileSystem/RawLocalFileSystem can be of any help here?


On 15-Nov-2010, at 3:10 PM, Sebastian Schoenherr wrote:

> Hi Matthew,
> of course, you can copy it directly to HDFS and vice versa. Use the IOUtils (hadoop.io.IOUtils)
like this:
> FileSystem fileSystem = FileSystem.get(conf); (org.apache.hadoop.fs.FileSystem)
> "in" and "out" are the streams (out is in this example the HDFS outputstream)
> IOUtils.copyBytes(in, out, fileSystem.getConf());
> hope this helps,
> sebastian
> Zitat von Matthew John <tmatthewjohn1988@gmail.com>:
>> Hi all ,
>> I have been working with MapReduce and HDFS for sometime. So the procedure
>> what I normally follow is :
>> 1) copy in the input file from Local File System to HDFS
>> 2) run the map reduce module
>> 3) copy the output file back to the Local File System from the HDFS
>> But I feel , step 1 and 3 is  adding a lot of overhead to the entire process
>> !!
>> My queries are :
>> 1) I am getting the files into the Local File System by establishing a port
>> connection with another node. So can I ensure that the data which is ported
>> into the hadoop node is directly written to the HDFS instead of going
>> through the Local File System and then performing a CopyFromLocal ???
>> 2) Can I copy the reduce output (which creates the final output file)
>> directly to the Local File System instead of injecting it to the HDFS
>> (effectively into different nodes in HDFS), so that I can minimize the
>> overhead ?? I expect this procedure to take much lesser time than copying to
>> the HDFS and then performing a CopyToLocal.. Finally I should be able to
>> send this file back to another node using socket communication..
>> Looking forward to your suggestions !!
>> Thanks,
>> Matthew John

View raw message