hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felix Chern <idry...@gmail.com>
Subject Re: Capture Directory Context in Hadoop Mapper
Date Thu, 30 Jan 2014 18:15:28 GMT
MultipleInputs is nice. Most of the time, I use it for reduce-side join.
It's great, however, you'll need to specify different Mapper class per input directory.
In our case, we try to let the Mapper itself to capture the directory information, because
these directories might contain
data across months, and the the file structures may differ a bit time by time.
Finally, this is the solution I came up with, and it's fun to hack on lower level APIs. :D

Yet, thanks for suggesting!

Felix

On Jan 29, 2014, at 10:15 PM, Harsh J <harsh@cloudera.com> wrote:

> Hi,
> 
> These posts are nicely written - thanks for sharing! Have you also
> taken a look at the MultipleInputs feature, which gives you a cleaner
> approach? http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.html
> 
> On Thu, Jan 30, 2014 at 4:16 AM, Felix Chern <idryman@gmail.com> wrote:
>> Hi all,
>> 
>> I wrote a tutorial of how to receive path information in Mapper class. It's
>> useful in our hadoop use case where we need to apply different logic on
>> different input source directory. Enjoy!
>> 
>> http://www.idryman.org/blog/2014/01/26/capture-directory-context-in-hadoop-mapper/
>> http://www.idryman.org/blog/2014/01/27/capture-path-info-in-hadoop-inputformat-class/
>> 
>> Felix
> 
> 
> 
> -- 
> Harsh J


Mime
View raw message