hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mfc <mikefconn...@verizon.net>
Subject Re: Using Map/Reduce without HDFS?
Date Mon, 27 Aug 2007 00:45:03 GMT

Hi,

I'm assuming that the input to a Hadoop job is a large set of large ASCII 
files that you run a map/reduce job on.

If I'm starting with a large number of small ASCII files outside of the
HDFS, 
where/when does the  conversion to large files take place?

You seem to be recommending a pre-step (is that correct?)
that first does cat'ing and gzip'ing in order to convert
the small files to big files. Once this is done you copy
the big files into the HDFS and run a map/reduce job on them.

...but then the map/reduce job in HADOOP breaks the large files back down
into small chunks. This is what prompted the question in the first place
about running Map/Reduce directly on the small files in the local file
system.

I'm wondering if doing the conversion to large files and copy into HDFS 
would introduce a lot of overhead that would not be neccessary if map/reduce
could be run directly on the local file system on the small files.

I'd be interested in knowing if this is an appropriate use of Hadoop, I've
got limited
knowledge about Hadoop, and I'm just trying to learn about where/how it can
be
used.

Thanks
-- 
View this message in context: http://www.nabble.com/Using-Map-Reduce-without-HDFS--tf4331338.html#a12340126
Sent from the Hadoop Users mailing list archive at Nabble.com.


Mime
View raw message