hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bejoy Ks <bejoy.had...@gmail.com>
Subject Re: reading a binary file
Date Mon, 03 Sep 2012 15:15:58 GMT
Hi Francesco

TextInputFormat reads line by line based on '\n' by default, there the key
values is the position offset and the line contents respectively. But in
your case it is just a sequence of integers and also it is Binary. Also you
require the offset for each integer value and not offset by line.
I believe you may have to write your own custom  Record Reader to get this
done.

On Mon, Sep 3, 2012 at 8:38 PM, Francesco Silvestri <yuri.rwx@gmail.com>wrote:

> Hi Mohammad,
>
> SequenceFileInputFormat<http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/SequenceFileInputFormat.html>
requires
> the file to be a sequence of key/value stored in binary (i.e., the key is
> stored in the file). In my case, the key is implicitly given by the
> position of the value within the file.
>
> Thank you,
> Francesco
>
>
>
> On Mon, Sep 3, 2012 at 5:01 PM, Mohammad Tariq <dontariq@gmail.com> wrote:
>
>> Hello Francesco,
>>
>>         Have a look at SequenceFileInputFormat :
>> http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/SequenceFileInputFormat.html
>>
>> Regards,
>>     Mohammad Tariq
>>
>>
>>
>> On Mon, Sep 3, 2012 at 8:26 PM, Francesco Silvestri <yuri.rwx@gmail.com>wrote:
>>
>>> Hello,
>>>
>>> I have a binary file of integers and I would like an input format that
>>> generates pairs <key,value>, where value is an integer in the file and
key
>>> the position of the integer in the file. Which class should I use? (i.e.
>>> I'm looking for a kind of TextinputFormat for binary files)
>>>
>>> Thank you for your consideration,
>>>
>>> Francesco
>>>
>>
>>
>

Mime
View raw message