hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Public Network Services <publicnetworkservi...@gmail.com>
Subject Re: MapReduce processing with extra (possibly non-serializable) configuration
Date Fri, 22 Feb 2013 04:11:11 GMT
You mean save the serialized configuration object in the custom split file,
retrieve that in the Mapper, reconstruct the configuration and use the rest
of the split file (i.e., the actual data) as input to the map function?


On Thu, Feb 21, 2013 at 5:57 PM, Azuryy Yu <azuryyyu@gmail.com> wrote:

> I just have one simple suggestion for you: writer an customer split to
> replace FileSplit, include all your special configurations in this split.
> then write a custom InputFormat.
>
> during map phrase, you can get this split, then you get all special
> configurations.
>
>
>
> On Fri, Feb 22, 2013 at 5:10 AM, Public Network Services <
> publicnetworkservices@gmail.com> wrote:
>
>> Hi...
>>
>> I am trying to put an existing file processing application into Hadoop
>> and need to find the best way of propagating some extra configuration per
>> split, in the form of complex and proprietary custom Java objects.
>>
>> The general idea is
>>
>>    1. A custom InputFormat splits the input data
>>    2. The same InputFormat prepares the appropriate configuration for
>>    each split
>>    3. Hadoop processes each split in MapReduce, using the split itself
>>    and the corresponding configuration
>>
>> The problem is that these configuration objects contain a lot of
>> properties and references to other complex objects, and so on, therefore it
>> will take a lot of work to cover all the possible combinations and make the
>> whole thing serializable (if it can be done in the first place).
>>
>> Most probably this is the only way forward, but if anyone has ever dealt
>> with this problem, please suggest the best approach to follow.
>>
>> Thanks!
>>
>>
>

Mime
View raw message