hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Wong <sw...@netflix.com>
Subject Need help with cluster setup for performance [Impala]
Date Wed, 23 Jan 2013 21:07:28 GMT
My apologies for sending this message to this group, but I'm having trouble sending to the
right group.

From: Steven Wong
Sent: Wednesday, January 23, 2013 11:15 AM
To: impala-user@cloudera.org
Subject: RE: Need help with cluster setup for performance

Thanks for the suggestions. The /metrics output looks good now, and the SELECT COUNT(*) runs
much faster than before.

But I still have the "Unknown disk id" error message. My CDH version is:

 hadoop-client        x86_64 2.0.0+552-1.cdh4.1.2.p0.27.el5 cloudera-cdh4  18 k
 hadoop-mapreduce     x86_64 2.0.0+552-1.cdh4.1.2.p0.27.el5 cloudera-cdh4 9.8 M
 hadoop-yarn          x86_64 2.0.0+552-1.cdh4.1.2.p0.27.el5 cloudera-cdh4 8.9 M

On Tuesday, January 22, 2013 5:37:30 PM UTC-8, Henry wrote:
On 22 January 2013 11:40, Steven Wong <sw...@netflix.com> wrote:

I followed http://zenfractal.com/2012/11/15/from-zero-to-impala-in-minutes/ to set up a cluster
on EC2. After seeing disappointing performance numbers from a SELECT COUNT(*), I am following
to check my cluster setup. Questions:

1. My cluster has 3 data nodes. Is the following http://<hostname>:<port>/metrics
output good?

{<> : OK

Hi Steven -

This looks like your problem. Your machines are registering themselves with 'localhost' as
their hostname, and this means that they all look the same to the statestore.

I looked at Matt's zero-to-impala link - it's awesome, but now a little out of date. You should
modify where you run impalad to also have --ipaddress and --hostname correctly set for each
node. Then check the statestore metrics; things should look a lot better and your performance
should improve.

2. My impalad logs contain "Unknown disk id.  This will negatively affect performance.  Check
your hdfs settings to enable block location metadata." and my http://<hostname>:<port>/varz
doesn't contain the string "dfs.datanode.hdfs-blocks-metadata.enabled". But my hdfs-site.xml
sets dfs.datanode.hdfs-blocks-metadata.enabled to true. Why?

What version of CDH are you using?

3. My impalad.out doesn't contain "Unable to load native-hadoop library". This is good, I

4. My impalad logs contain the following lines matching the word "scheduler", but none contains
"locality percentage". Why?

The locality percentage is printed only for GLOG_v=1 - and I note that the setup-impala.sh
script has  a typo where it has GVLOG_v=1. If you fix this, you should see the locality percentage.

Hope this helps - let us know if things improve.


/tmp/impalad.INFO:I0122 00:19:09.137197  5121 simple-scheduler.cc:82] Starting simple scheduler
/tmp/impalad.ip-10-170-17-154.impala.log.INFO.20130122-001901.5121:I0122 00:19:09.137197 
5121 simple-scheduler.cc:82] Starting simple scheduler



Henry Robinson
Software Engineer

View raw message