hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <qwertyman...@gmail.com>
Subject Re: Best practice for batch file conversions
Date Tue, 08 Feb 2011 03:29:22 GMT
Extend FileInputFormat, and write your own binary-format based
implementation of it, and make it non-splittable (isSplitable should
return false). This way, a Mapper would get a whole file, and you
shouldn't have block-splitting issues.

On Tue, Feb 8, 2011 at 6:37 AM, felix gao <gre1600@gmail.com> wrote:
> Hello users of hadoop,
> I have a task to convert large binary files from one format to another.  I
> am wondering what is the best practice to do this.  Basically, I am trying
> to get one mapper to work on each binary file and i am not sure how to do
> that in hadoop properly.
> thanks,
> Felix

Harsh J

View raw message