hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Cogan <peter.co...@gmail.com>
Subject Re: Passing data via Configuration
Date Fri, 08 Feb 2013 19:51:24 GMT
Hi Rob,

thanks for the explanation - I had also thought about 'cheating' by
serialising - I guess that's the way to go in my case as the data structure
is really quite small.

thanks!


On Fri, Feb 8, 2013 at 3:23 PM, Robert Evans <evans@yahoo-inc.com> wrote:

> You could, but this is generally discouraged.  Pig does something like
> this by taking the object serializing it out into a byte array and then
> using base64 encoding turns it into a string that is put in the config.
>  The problem with this is that the config can grow very large.  In the 1.0
> line of Hadoop the maximum size of the Job's config is limited to avoid
> causing the Job Tracker to go out of memory.  In V2 this is less of a
> concern because it is your own application master that has to read it all
> in.
>
> In general if it is a very small amount of data you can play games like
> this, if it is a large amount of data you probably want to use the
> distributed cache to do this instead.
>
> --Bobby
>
> From: Peter Cogan <peter.cogan@gmail.com>
> Reply-To: "user@hadoop.apache.org" <user@hadoop.apache.org>
> Date: Friday, February 8, 2013 9:15 AM
> To: "user@hadoop.apache.org" <user@hadoop.apache.org>
> Subject: Passing data via Configuration
>
> Hi,
>
> I have data stored in an object that I want to pass into my Mapper.
>
> I see from Configuration that there are setters and getters for
> primitives, but is there a way of doing this with non-primitives - either
> my own classes or builtin classes (such as HashMap etc)
>
> thanks!
> Peter
>

Mime
View raw message