hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Evans <ev...@yahoo-inc.com>
Subject Re: Passing data via Configuration
Date Fri, 08 Feb 2013 15:23:29 GMT
You could, but this is generally discouraged.  Pig does something like this by taking the object
serializing it out into a byte array and then using base64 encoding turns it into a string
that is put in the config.  The problem with this is that the config can grow very large.
 In the 1.0 line of Hadoop the maximum size of the Job's config is limited to avoid causing
the Job Tracker to go out of memory.  In V2 this is less of a concern because it is your own
application master that has to read it all in.

In general if it is a very small amount of data you can play games like this, if it is a large
amount of data you probably want to use the distributed cache to do this instead.

--Bobby

From: Peter Cogan <peter.cogan@gmail.com<mailto:peter.cogan@gmail.com>>
Reply-To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Date: Friday, February 8, 2013 9:15 AM
To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Subject: Passing data via Configuration

Hi,

I have data stored in an object that I want to pass into my Mapper.

I see from Configuration that there are setters and getters for primitives, but is there a
way of doing this with non-primitives - either my own classes or builtin classes (such as
HashMap etc)

thanks!
Peter

Mime
View raw message