incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ratner, Alan S (IS)" <Alan.Rat...@ngc.com>
Subject RE: Continuing Chukwa Installation Problems
Date Tue, 01 Jun 2010 18:51:56 GMT
Eric,

   I simply followed the instructions:

# Copy CHUKWA_HOME/conf/hadoop-log4j.properties file to
HADOOP_HOME/conf/log4j.properties
# Copy CHUKWA_HOME/conf/hadoop-metrics.properties file to
HADOOP_HOME/conf/hadoop-metrics.properties
# Edit HADOOP_HOME/conf/hadoop-metrics.properties file and change
@CHUKWA_LOG_DIR@ to your actual CHUKWA log dirctory (ie,
CHUKWA_HOME/var/log)

So in log4j I have:
log4j.appender.DRFAAUDIT.chukwaClientPortNum=9093
log4j.appender.MR_CLIENTTRACE.chukwaClientPortNum=9093
log4j.appender.HDFS_CLIENTTRACE.chukwaClientPortNum=9093

And in chukwa-agent.conf.xml I've got:
  <property>
    <name>chukwaAgent.control.port</name>
    <value>9093</value>
    <description>The socket port number the agent's control interface
can be contacted at.</description>
  </property>

There is no firewall within our cluster so the agents should be able to
communicate readily with the collector.  Ubuntu's network port scan
shows port 9093 "open" on the agent servers but apparently not on the
collector server.

-----Original Message-----
From: Eric Yang [mailto:eyang@yahoo-inc.com] 
Sent: Tuesday, June 01, 2010 1:13 PM
To: chukwa-user@hadoop.apache.org
Subject: Re: Continuing Chukwa Installation Problems

What does log4j.properties and hadoop-metrics.properties look like?  The
files copied to Hadoop conf directory may be the source template instead
of
the generated conf file.

Make sure the port number matches in hadoop log4j.properties and the
agent
port.

I.e, log4j.properties should have:

log4j.appender.DRFA.chukwaClientPortNum=9093

Chukwa-agent-conf.xml should have:

  <property>
    <name>chukwaAgent.control.port</name>
    <value>9093</value>
    <description>The socket port number the agent's control interface
can be
contacted at.</description>
  </property>

And port 9093 is not firewalled.

Regards,
Eric

On 6/1/10 5:25 AM, "Ratner, Alan S (IS)" <Alan.Ratner@ngc.com> wrote:

> I'm now following the latest instructions on installing Chukwa.  When
I
> launch Hadoop I get various Chukwa-related errors although they do not
> seem to interfere with my running Hadoop.
> 
> My collectors file looks like this:
> http://localhost:8080
> 
> My agents files currently looks like this:
> 10.64.147.3
> 10.64.147.4
> ...
> 10.64.147.12
> 10.64.147.13
> 
> My initial_adaptors file is the default:
> add org.apache.hadoop.chukwa.datacollection.adaptor.ExecAdaptor Iostat
> 60 /usr/bin/iostat -x -k 55 2 0
> add org.apache.hadoop.chukwa.datacollection.adaptor.ExecAdaptor Df 60
> /bin/df -l 0
> add org.apache.hadoop.chukwa.datacollection.adaptor.ExecAdaptor Sar 60
> /usr/bin/sar -q -r -n ALL 55 0
> add org.apache.hadoop.chukwa.datacollection.adaptor.ExecAdaptor Top 60
> /usr/bin/top -b -n 1 -c 0
> 
> Here's what happens when I launch Hadoop.  (I am assuming the
> initialization sequence is a) format namenode, b) launch Hadoop, c)
> start Chukwa agents, d) start Chukwa collector.)  It looks like I have
2
> sets of problems, presumably related:
> 1. bad adaptor file
> 2. some sort of password/authentification problem (Note that the
agents
> file currently contains a subset of nodes with all nodes giving me
> authentication socket errors and agent nodes additionally giving me
> password and errors.)  These errors surprise me since I can ssh
between
> any 2 servers in the cluster.
> 
> ngc@hadoop1:~/hadoop-0.20.2$ bin/start-all.sh
> starting namenode, logging to
> /home/ngc/hadoop-0.20.2/bin/../logs/hadoop-ngc-namenode-hadoop1.out
> Error initializing ChukwaClient with list of currently registered
> adaptors, clearing our local list of adaptors
> log4j:ERROR cleanUpRegex == null ||
!cleanUpRegex.contains("$fileName")
> 10.64.147.7: starting datanode, logging to
> /home/ngc/hadoop-0.20.2/bin/../logs/hadoop-ngc-datanode-hadoop6.out
> 10.64.147.3: starting datanode, logging to
> /home/ngc/hadoop-0.20.2/bin/../logs/hadoop-ngc-datanode-hadoop2.out
> ...
> 10.64.147.30: starting datanode, logging to
> /home/ngc/hadoop-0.20.2/bin/../logs/hadoop-ngc-datanode-hadoop29.out
> 10.64.147.21: starting datanode, logging to
> /home/ngc/hadoop-0.20.2/bin/../logs/hadoop-ngc-datanode-hadoop20.out
> 10.64.147.2: starting secondarynamenode, logging to
>
/home/ngc/hadoop-0.20.2/bin/../logs/hadoop-ngc-secondarynamenode-hadoop1
> .out
> 10.64.147.2: Error initializing ChukwaClient with list of currently
> registered adaptors, clearing our local list of adaptors
> 10.64.147.2: log4j:ERROR cleanUpRegex == null ||
> !cleanUpRegex.contains("$fileName")
> starting jobtracker, logging to
> /home/ngc/hadoop-0.20.2/bin/../logs/hadoop-ngc-jobtracker-hadoop1.out
> Error initializing ChukwaClient with list of currently registered
> adaptors, clearing our local list of adaptors
> log4j:ERROR cleanUpRegex == null ||
!cleanUpRegex.contains("$fileName")
> ngc@10.64.147.7's password: 10.64.147.35: Error reading response
length
> from authentication socket.
> ngc@10.64.147.8's password: 10.64.147.30: Error reading response
length
> from authentication socket.
> 10.64.147.9: Error reading response length from authentication socket.
> 10.64.147.7: Error reading response length from authentication socket.
> ngc@10.64.147.5's password: 10.64.147.18: Error reading response
length
> from authentication socket.
> 10.64.147.20: Error reading response length from authentication
socket.
> 10.64.147.5: Error reading response length from authentication socket.
> 10.64.147.21: Error reading response length from authentication
socket.
> ngc@10.64.147.10's password: 10.64.147.10: Error reading response
length
> from authentication socket.
> 10.64.147.17: Error reading response length from authentication
socket.
> 10.64.147.26: Error reading response length from authentication
socket.
> 10.64.147.33: Error reading response length from authentication
socket.
> 10.64.147.25: Error reading response length from authentication
socket.
> ...
> 10.64.147.3: starting tasktracker, logging to
> /home/ngc/hadoop-0.20.2/bin/../logs/hadoop-ngc-tasktracker-hadoop2.out
> 10.64.147.29: Error reading response length from authentication
socket.
> 10.64.147.39: Error reading response length from authentication
socket.
> 10.64.147.40: Error reading response length from authentication
socket.
> 10.64.147.37: Error reading response length from authentication
socket.
> 10.64.147.38: Error reading response length from authentication
socket.
> ngc@10.64.147.4's password: 10.64.147.4: Error reading response length
> from authentication socket.
> 10.64.147.42: Error reading response length from authentication
socket.
> 10.64.147.41: Error reading response length from authentication
socket.
> 10.64.147.36: Error reading response length from authentication
socket.
> 10.64.147.31: Error reading response length from authentication
socket.
> 10.64.147.27: Error reading response length from authentication
socket.
> 10.64.147.34: Error reading response length from authentication
socket.
> 10.64.147.13: starting tasktracker, logging to
>
/home/ngc/hadoop-0.20.2/bin/../logs/hadoop-ngc-tasktracker-hadoop12.out
> 10.64.147.11: starting tasktracker, logging to
>
/home/ngc/hadoop-0.20.2/bin/../logs/hadoop-ngc-tasktracker-hadoop10.out
> 10.64.147.21: starting tasktracker, logging to
>
/home/ngc/hadoop-0.20.2/bin/../logs/hadoop-ngc-tasktracker-hadoop20.out
> ...
> 10.64.147.15: starting tasktracker, logging to
>
/home/ngc/hadoop-0.20.2/bin/../logs/hadoop-ngc-tasktracker-hadoop14.out
> 10.64.147.38: starting tasktracker, logging to
>
/home/ngc/hadoop-0.20.2/bin/../logs/hadoop-ngc-tasktracker-hadoop37.out
> 10.64.147.37: starting tasktracker, logging to
>
/home/ngc/hadoop-0.20.2/bin/../logs/hadoop-ngc-tasktracker-hadoop36.out
> 10.64.147.3: Error initializing ChukwaClient with list of currently
> registered adaptors, clearing our local list of adaptors
> 10.64.147.3: log4j:ERROR cleanUpRegex == null ||
> !cleanUpRegex.contains("$fileName")
> ...


Mime
View raw message