hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eduard Skaley <e.v.ska...@gmail.com>
Subject Re: Map Shuffle Bytes
Date Wed, 26 Dec 2012 12:56:02 GMT
For this I need to know where an inputsplit is located. And where a join 
is computed. How can I do this programmatically ?
> This isn't called 'shuffle' (but rather a plain remote read) so your
> original question was confusing, thanks for clarifying!
>
> In that case, you could count the bytes coming in from the required
> record reader - for example a TextRecordReader uses a Long key that
> denotes current offset in file, which you could use as a simple,
> progressing counter of bytes read thus far.
>
> On Wed, Dec 26, 2012 at 5:16 PM, Eduard Skaley <e.v.skaley@gmail.com> wrote:
>> Hi,
>>
>> I mean TO the mappers. I'm using the CompositeInputFormat for my application
>> to compute map-side joins.
>> I want to join two datasets A and B one is stored on node 1 and the other
>> one on node 2.
>> For example if the join will be computed on node 2 then the inputsplit of
>> the dataset which is stored on node 1 has to be transferred to node 2.
>> I want to count the bytes which are shuffled (transferred) TO the mapper of
>> node 2.
>>
>>> Hi,
>>>
>>> What do you mean by "shuffled bytes [to] the mappers"? If you mean
>>> "from", it is "Reduce shuffle bytes" you look for; otherwise, you may
>>> be looking for the per-map counter of "Map output bytes".
>>>
>>> Per-partition counters can be constructed on the user side if needed,
>>> by pre-computing the partition before emit (using the same
>>> partitioner) and counting up the bytes of your objects for its
>>> counter.
>>>
>>> On Tue, Dec 25, 2012 at 6:03 PM, Eduard Skaley <e.v.skaley@gmail.com>
>>> wrote:
>>>> Hello guys,
>>>>
>>>> I need a counter for shuffled bytes to the mappers.
>>>> Is there existing one or should I define one myself ?
>>>> How can I implement such a counter?
>>>>
>>>> Thank you and happy Christmas time,
>>>> Eduard
>>>
>>>
>
>


Mime
View raw message