hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhiwei Xiao <zwx...@gmail.com>
Subject How to send objects to map task?
Date Tue, 27 Sep 2011 22:42:43 GMT
Hi,

My application needs to send some objects to map tasks, which specify how to
process the input records. I know I can transfer them as string via the
configuration file. But I prefer to leverage hadoop Writable interface,
since the objects require a recursive serialization.

I tried to create a subclass of FileSplit to convey the data, but finally I
found that it's not elegant to implement. Because the FileSplits are
initialized in getSplits() of InputFormat, while the only way to initialize
the InputFormat is via the setConf(). So I have to end up implementing 3 new
subclass with the same custom fields: FileSplit, InputFormat and
Configuration.

Another approach may be to write these objects to a file on the HDFS or
DistributedCache.

I just wonder is there a better way to do this job?

Thank you.
---
Zhiwei Xiao

Mime
View raw message