hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: information, whether a GET Request inside Map-Task is data local or not
Date Tue, 14 Feb 2012 18:07:31 GMT
Hey Christopher,

Thanks for reporting back. One thing about this is unless you have
contention at your top of the rack switches, issuing a get on the
local node or a remote one shouldn't be very different. What is going
to make a big difference is if you have to hit disk or not.

J-D

On Tue, Feb 14, 2012 at 7:27 AM, Christopher Dorner
<christopher.dorner@gmail.com> wrote:
> Hi,
> sorry for a very late reply on this topic, but i was busy and now i
> promised to report back.
>
> I implemented your suggested "hack" :) It is actually only few lines of
> code. One for getting the machines hostname and one for retrieving the
> destination of the get request. Then i set up two counters, one for the
> data local get requests and one for the others.
>
> It gives us some sort of idea about the network I/O when having GET
> requests inside mappers, but it is kind of obvious:
>
> Since data locality only kicks in for the input to the mapper (HBase table
> scan or straight from HDFS, which both work very well), it is unpredictable
> to which machine the request will be pointed at.
> I am not sure what exactly to do with this information. As I said, it can
> only help to estimate network traffic, but it does not help to tune this
> aspect in any way.
>
> But as a conclusion, it is possible to retrieve this sort of information by
> dirty hacks :)
>
> Regards,
> Christopher
>
>
>
>
> On Mon, Jan 9, 2012 at 11:36 PM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:
>
>> It would definitely be interesting, please do report back.
>>
>> Thx,
>>
>> J-D
>>
>> On Mon, Jan 9, 2012 at 2:33 PM, Christopher Dorner
>> <christopher.dorner@gmail.com> wrote:
>> > Thank you for the reply.
>> > Though that sounds a bit like some dirty hacking, it seems to be doable.
>> I
>> > think i will give it a try.
>> > I can report back when i get some usable results. Maybe some more people
>> are
>> > interested in that.
>> >
>> > Christopher
>> >
>> >
>> > Am 09.01.2012 23:15, schrieb Jean-Daniel Cryans:
>> >
>> >> Short answer: no.
>> >>
>> >> Painful way to get around the problem:
>> >>
>> >> You *could* by looking up the machines hostname when the job starts
>> >> and then from the HConnection that HTables can give you through
>> >> getConnection() do getRegionLocation for the row you are going to Get
>> >> and then get the hostname by getServerAddress().getHostname()
>> >>
>> >> J-D
>> >>
>> >> On Mon, Jan 9, 2012 at 1:19 PM, Christopher Dorner
>> >> <christopher.dorner@gmail.com>  wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> i am using the input of a mapper as a rowkey to make a GET Request to
a
>> >>> table.
>> >>>
>> >>> Is it somehow possible to retrieve information about how much data had
>> to
>> >>> be
>> >>> transferred over network or how many of the requests were data local
>> >>> (namenodes are also regionservers) or where the request was not on the
>> >>> same
>> >>> node?
>> >>>
>> >>> That would be some really cool and useful statistics for us :)
>> >>>
>> >>> Thank you,
>> >>>
>> >>> Christopher Dorner
>> >
>> >
>>

Mime
View raw message