hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Sichi <jsi...@facebook.com>
Subject locality and tasktracker vs split hostname
Date Tue, 11 May 2010 01:26:38 GMT
Hi there,

I was running a mapreduce job (via Hive) against HBase, and noticed that I wasn't getting
any locality (the input split location and the task tracker machine in the job tracker UI
were always different, and "Rack-local map tasks" in the job counters was 0).

I tracked this down to a discrepancy in the way hostnames were being compared.

The task tracker detail had a Host like

/f/s/1.2.3.4/h.s.f.com.

(with trailing dot)

But the Input Split Location had

/f/s/1.2.3.4/h.s.f.com

(without trailing dot)

I tweaked the Hive storage handler to add a dot to the end of the TableSplit.getLocation()
returned by HBase's TableInputFormatBase, and that way I was successfully able to achieve
locality after retesting.

So I am guessing this is probably due to something anomalous with my test cluster's configuration,
but in case other people are hitting the same thing, it could be addressed by either

(a) making the Hadoop locality code use a hostname comparison which is insensitive to the
presence of the trailing dot

or

(b) making the HBase split's hostname consistent with the task tracker

Any opinions?

JVS


Mime
View raw message