hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Dorner <christopher.dor...@gmail.com>
Subject Re: information, whether a GET Request inside Map-Task is data local or not
Date Tue, 14 Feb 2012 15:27:51 GMT
Hi,
sorry for a very late reply on this topic, but i was busy and now i
promised to report back.

I implemented your suggested "hack" :) It is actually only few lines of
code. One for getting the machines hostname and one for retrieving the
destination of the get request. Then i set up two counters, one for the
data local get requests and one for the others.

It gives us some sort of idea about the network I/O when having GET
requests inside mappers, but it is kind of obvious:

Since data locality only kicks in for the input to the mapper (HBase table
scan or straight from HDFS, which both work very well), it is unpredictable
to which machine the request will be pointed at.
I am not sure what exactly to do with this information. As I said, it can
only help to estimate network traffic, but it does not help to tune this
aspect in any way.

But as a conclusion, it is possible to retrieve this sort of information by
dirty hacks :)

Regards,
Christopher




On Mon, Jan 9, 2012 at 11:36 PM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:

> It would definitely be interesting, please do report back.
>
> Thx,
>
> J-D
>
> On Mon, Jan 9, 2012 at 2:33 PM, Christopher Dorner
> <christopher.dorner@gmail.com> wrote:
> > Thank you for the reply.
> > Though that sounds a bit like some dirty hacking, it seems to be doable.
> I
> > think i will give it a try.
> > I can report back when i get some usable results. Maybe some more people
> are
> > interested in that.
> >
> > Christopher
> >
> >
> > Am 09.01.2012 23:15, schrieb Jean-Daniel Cryans:
> >
> >> Short answer: no.
> >>
> >> Painful way to get around the problem:
> >>
> >> You *could* by looking up the machines hostname when the job starts
> >> and then from the HConnection that HTables can give you through
> >> getConnection() do getRegionLocation for the row you are going to Get
> >> and then get the hostname by getServerAddress().getHostname()
> >>
> >> J-D
> >>
> >> On Mon, Jan 9, 2012 at 1:19 PM, Christopher Dorner
> >> <christopher.dorner@gmail.com>  wrote:
> >>>
> >>> Hi,
> >>>
> >>> i am using the input of a mapper as a rowkey to make a GET Request to a
> >>> table.
> >>>
> >>> Is it somehow possible to retrieve information about how much data had
> to
> >>> be
> >>> transferred over network or how many of the requests were data local
> >>> (namenodes are also regionservers) or where the request was not on the
> >>> same
> >>> node?
> >>>
> >>> That would be some really cool and useful statistics for us :)
> >>>
> >>> Thank you,
> >>>
> >>> Christopher Dorner
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message