hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Robertson <timrobertson...@gmail.com>
Subject Re: MR not seeing data locality - IP versus Host name
Date Mon, 28 May 2012 13:54:56 GMT
Thanks Stack.  We're looking into this a lot.

As far as we can tell DNS is correct, machine host names are correct etc.
In .META. it uses fully qualified names (c4n5.gbif.org) so I guess I'll
start looking at the job launching machine.

The code you link to is quite different to the TableInputFormatBase in
CDH3u3.  I actually patched that with the following to verify to myself it
would work, and it did indeed work (got a blog about the performance which
you'll like):

      // patch the possible GBIF DNS issue - TT report differing things to
split locations

      // Task attempts show as /default-rack/c4n2.gbif.org

      // splits are coming in as /default-rack/130.226.238.182

      regionLocation = regionLocation.replaceAll("130.226.238.181", "
c4n1.gbif.org");

      regionLocation = regionLocation.replaceAll("130.226.238.182", "
c4n2.gbif.org");

      regionLocation = regionLocation.replaceAll("130.226.238.183", "
c4n3.gbif.org");

      regionLocation = regionLocation.replaceAll("130.226.238.184", "
c4n4.gbif.org");

      regionLocation = regionLocation.replaceAll("130.226.238.185", "
c4n5.gbif.org");

      regionLocation = regionLocation.replaceAll("130.226.238.186", "
c4n6.gbif.org");

More when we know more.
Tim


On Mon, May 28, 2012 at 12:32 AM, Stack <stack@duboce.net> wrote:

> On Sun, May 27, 2012 at 1:05 PM, Tim Robertson
> <timrobertson100@gmail.com> wrote:
> > Hi all,
> >
> > When I run MR jobs, I don't see data locality because the TT sees
> > /default-rack/c4n1.gbif.org but the TableInputFormat is
> > giving /default-rack/130.226.238.181 (the same machine) when it
> determines
> > the splits for the job.
>
> Its doing this Tim:
>
>
> http://hbase.apache.org/xref/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.html#145
>
> On the machine launching the job, its asking what the region location
> is.  What is in .META. table?  Names or IPs?  If former, then its the
> resolve on the machine launching the job that is mangling it (DNS
> falls back to IP if problem figuring name).  Can you mess w/ the DNS
> on the machine that is launching the job?  See if you can find issue
> in its DNS (This is 0.90.X?  If so, does its forward and back DNS give
> same answer?  If 0.92.1, shouldn't matter).
>
> St.Ack
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message