hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: How to use CombineFileInputFormat in Hadoop?
Date Thu, 12 Jul 2012 15:55:06 GMT
Hey Manoj,

I find the asker name here quite strange, although it is the same
question, ha: http://stackoverflow.com/questions/10380200/how-to-use-combinefileinputformat-in-hadoop

Anyhow, here's one example:
http://blog.yetitrails.com/2011/04/dealing-with-lots-of-small-files-in.html

On Thu, Jul 12, 2012 at 8:33 PM, Manoj Babu <manoj444@gmail.com> wrote:
> Gentles,
>
> I want to use the CombineFileInputFormat of Hadoop 0.20.0 / 0.20.2 such that
> it processes 1 file per record and also doesn't compromise on data -
> locality (which it normally takes care of).
>
> It is mentioned in Tom White's Hadoop Definitive Guide but he has not shown
> how to do it. Instead, he moves on to Sequence Files.
>
> I am pretty confused on what is the meaning of processed variable in a
> record reader. Any code example would be of tremendous help.
>
> Thanks in advance..
>
> Cheers!
> Manoj.
>



-- 
Harsh J

Mime
View raw message