hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From felix gao <gre1...@gmail.com>
Subject Re: Best practice for batch file conversions
Date Tue, 08 Feb 2011 17:43:07 GMT
thanks a lot for the pointer. I will play around with it.

On Mon, Feb 7, 2011 at 10:55 PM, Sonal Goyal <sonalgoyal4@gmail.com> wrote:

> Hi,
>
> You can use FileStreamInputFormat which returns the file stream as the
> value.
>
>
> https://github.com/sonalgoyal/hiho/tree/hihoApache0.20/src/co/nubetech/hiho/mapreduce/lib/input
>
> You need to remember that you lose data locality by trying to manipulate
> the file as a whole, but in your case, the requirement probably demands it.
>
> Thanks and Regards,
> Sonal
> <https://github.com/sonalgoyal/hiho>Connect Hadoop with databases,
> Salesforce, FTP servers and others <https://github.com/sonalgoyal/hiho>
> Nube Technologies <http://www.nubetech.co>
>
> <http://in.linkedin.com/in/sonalgoyal>
>
>
>
>
>
>
> On Tue, Feb 8, 2011 at 8:59 AM, Harsh J <qwertymaniac@gmail.com> wrote:
>
>> Extend FileInputFormat, and write your own binary-format based
>> implementation of it, and make it non-splittable (isSplitable should
>> return false). This way, a Mapper would get a whole file, and you
>> shouldn't have block-splitting issues.
>>
>> On Tue, Feb 8, 2011 at 6:37 AM, felix gao <gre1600@gmail.com> wrote:
>> > Hello users of hadoop,
>> > I have a task to convert large binary files from one format to another.
>>  I
>> > am wondering what is the best practice to do this.  Basically, I am
>> trying
>> > to get one mapper to work on each binary file and i am not sure how to
>> do
>> > that in hadoop properly.
>> > thanks,
>> > Felix
>>
>>
>>
>> --
>> Harsh J
>> www.harshj.com
>>
>
>

Mime
View raw message