hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: MapReduce processing with extra (possibly non-serializable) configuration
Date Fri, 22 Feb 2013 06:15:01 GMT
How do you imagine sending "data" of any kind (be it in object form,
etc.) over the network to other nodes, without implementing or relying
on a serialization for it? Are you looking for "easy" Java ways such
as the distributed cache from Hazelcast, etc., where this may be taken
care for you automatically in some way? :)

On Fri, Feb 22, 2013 at 2:40 AM, Public Network Services
<publicnetworkservices@gmail.com> wrote:
> Hi...
>
> I am trying to put an existing file processing application into Hadoop and
> need to find the best way of propagating some extra configuration per split,
> in the form of complex and proprietary custom Java objects.
>
> The general idea is
>
> A custom InputFormat splits the input data
> The same InputFormat prepares the appropriate configuration for each split
> Hadoop processes each split in MapReduce, using the split itself and the
> corresponding configuration
>
> The problem is that these configuration objects contain a lot of properties
> and references to other complex objects, and so on, therefore it will take a
> lot of work to cover all the possible combinations and make the whole thing
> serializable (if it can be done in the first place).
>
> Most probably this is the only way forward, but if anyone has ever dealt
> with this problem, please suggest the best approach to follow.
>
> Thanks!
>



--
Harsh J

Mime
View raw message