hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahad Rana <a...@commoncrawl.org>
Subject Re: retrieving sequenceFile Postion of Key in mapper task
Date Fri, 09 Oct 2009 05:45:19 GMT
Oops, memory fails me. To correct my previous statement, for block
compressed files, getPosition reflects the position in the input stream of
the NEXT compressed block of data, so you have to watch for the change in
position after reading the key/value to capture a block transition.
Ahad.

On Thu, Oct 8, 2009 at 10:22 PM, Ahad Rana <ahad@commoncrawl.org> wrote:

> Hi Ishwar,
> You can implement a custom MapRunner and retrieve the position from the
> reader before calling your map function. Be aware though, that for block
> compressed files, the position returned represents block start position, not
> the individual record position.
>
> Ahad.
>
>
> On Thu, Oct 8, 2009 at 4:23 PM, ishwar ramani <rvmishwar@gmail.com> wrote:
>
>> Hi,
>>
>> I need to get the position of the key being processed in a mapper task.
>> My inputFile is a sequence file ....
>>
>> I tried the Context, but the best i could get was the inputsplit
>> position and the
>> file name ....
>>
>>
>> My other option is to start recording the pos in the key value while
>> generating
>> the sequence file.
>> But that would mean rewriting all the files i already have :(
>>
>> any thoughts?
>>
>> ishwar
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message