hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From feng lu <amuseme...@gmail.com>
Subject Re: MapReduce processing with extra (possibly non-serializable) configuration
Date Fri, 22 Feb 2013 01:55:37 GMT

May be you can see the useage of DistributedCache [0] , It's a facility
provided by the MR framework  to cache files (text,archives, jars etc)
needed by applications.


On Fri, Feb 22, 2013 at 5:10 AM, Public Network Services <
publicnetworkservices@gmail.com> wrote:

> Hi...
> I am trying to put an existing file processing application into Hadoop and
> need to find the best way of propagating some extra configuration per
> split, in the form of complex and proprietary custom Java objects.
> The general idea is
>    1. A custom InputFormat splits the input data
>    2. The same InputFormat prepares the appropriate configuration for
>    each split
>    3. Hadoop processes each split in MapReduce, using the split itself
>    and the corresponding configuration
> The problem is that these configuration objects contain a lot of
> properties and references to other complex objects, and so on, therefore it
> will take a lot of work to cover all the possible combinations and make the
> whole thing serializable (if it can be done in the first place).
> Most probably this is the only way forward, but if anyone has ever dealt
> with this problem, please suggest the best approach to follow.
> Thanks!

Don't Grow Old, Grow Up... :-)

View raw message