Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 26A791094D for ; Fri, 22 Nov 2013 13:31:08 +0000 (UTC) Received: (qmail 73676 invoked by uid 500); 22 Nov 2013 13:31:01 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 73333 invoked by uid 500); 22 Nov 2013 13:30:55 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 73326 invoked by uid 99); 22 Nov 2013 13:30:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Nov 2013 13:30:54 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of itretyakov@griddynamics.com designates 209.85.215.48 as permitted sender) Received: from [209.85.215.48] (HELO mail-la0-f48.google.com) (209.85.215.48) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Nov 2013 13:30:50 +0000 Received: by mail-la0-f48.google.com with SMTP id n7so922710lam.7 for ; Fri, 22 Nov 2013 05:30:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=griddynamics.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=CNbspd7HuDK/9/99UMQPyAn8qeblh+Yf3Sipz3beGEA=; b=k01cHP7gtsZoP0j37UrBkQwwaDqc6sQVyZB5nmbhTYm5u4m/v92LgV8XLOGxYIO/qL RNekDa8ZdXnInatNXbPg+06Bs2cB9faUzO7cFDa74sB4pcsfUwqUyMTuCGyh4aBs0RAY 3svibZZgur6KdgdQJQpVpncdx/L6plYKGkz0w= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=CNbspd7HuDK/9/99UMQPyAn8qeblh+Yf3Sipz3beGEA=; b=kAh1Y2dZXg/hw4oHby6b2cuiut1kov+/8u5lgvIRDtKPSpZL7x9oi96/1gRo+nB2rO KUddkMvFKULrZQ/L1LU1L4VilZNYoebe1Lo3Wm5LS5hsOiNOet3RmH3vTenUJrBExHY9 xG29YllOH6wpFGaBMxRKPPJSGR2/H8/WxsYl85ZqSPCmFIsOU28JhBLr0RtOs1MKajYR JU+j/KR/BOYlDd/aiBMi3xFpDFggK4HFFltyqJbBTmouhrHd5K9LcpWxMci8ZJI78qDu +//Fh3u8omH30hu4LF5OqTiMaxh/8qCBnMgvhvadGLuDBUhiLhVNH40MvVyRZbFyPb6D KAZw== X-Gm-Message-State: ALoCoQnqLdyxtYHt6ZrzOMKgpw/w3y05awEu1TpevSUjzGlp29Z1nyHF/NcUzjwWl9vNOCyNq+JM X-Received: by 10.112.172.137 with SMTP id bc9mr9242825lbc.21.1385127028581; Fri, 22 Nov 2013 05:30:28 -0800 (PST) MIME-Version: 1.0 Received: by 10.114.29.195 with HTTP; Fri, 22 Nov 2013 05:30:08 -0800 (PST) In-Reply-To: References: From: Ivan Tretyakov Date: Fri, 22 Nov 2013 17:30:08 +0400 Message-ID: Subject: Re: Problem sending metrics to multiple targets To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a11c2645681faa204ebc4042c X-Virus-Checked: Checked by ClamAV on apache.org --001a11c2645681faa204ebc4042c Content-Type: text/plain; charset=ISO-8859-1 We investigated the problem and found root cause. Metrics2 framework uses different from first version config parser (Metrics2 uses apache-commons, Metrics uses hadoop's). org.apache.hadoop.metrics2.sink.ganglia.AbstractGangliaSink uses commas as separators by default. So when we provide list of servers it returns everything until first separator - it is only first server from the list. But we were able to find workaround. Class parsing servers list (org.apache.hadoop.metrics2.util.Servers) handles only commas and spaces. It means if we will provide space separated list of servers instead of comma separated then new parser will be able to read whole servers list. After that all servers will be registered as metrics receivers and metrics will be sent to all of them. On Thu, Jan 17, 2013 at 7:17 PM, Ivan Tretyakov wrote: > Hi! > > We have following problem. > > There are three target hosts to send metrics: 192.168.1.111:8649, > 192.168.1.113:8649,192.168.1.115:8649 (node01, node03, node05). > But for example datanode (using > org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31) sends one metrics to > first target host and the another to the second and third. > So some metrics missed on second and third node. When gmetad collects > metrics from one of these we could not see certain metrics in ganglia. > > E.g. on node07 running only one process which sends metrics to ganglia - > datanode process and we could see following using tcpdump. > > Dumping traffic for about three minutes: > $ sudo -i tcpdump dst port 8649 and src host node07 | tee tcpdump.out > ... > $ head -n1 tcpdump.out > 12:18:05.559719 IP node07.dom.local.43350 > node01.dom.local.8649: UDP, > length 180 > $ tail -n1 tcpdump.out > 12:20:59.575144 IP node > > Then count packets and bytes sent to each target: > $ grep node01 tcpdump.out | wc -l > 5972 > $ grep node03 tcpdump.out | wc -l > 3812 > $ grep node05 tcpdump.out | wc -l > 3811 > $ grep node01 tcpdump.out | awk 'BEGIN{sum=0}{sum=sum+$8}END{print sum}' > 1048272 > $ grep node03 tcpdump.out | awk 'BEGIN{sum=0}{sum=sum+$8}END{print sum}' > 731604 > $ grep node05 tcpdump.out | awk 'BEGIN{sum=0}{sum=sum+$8}END{print sum}' > 731532 > > Also we could request gmond daemons which metrics do they have: > > $ nc node01 8649 | grep ProcessName_DataNode | head -n1 > TYPE="float" UNITS="" TN="0" TMAX="60" DMAX="0" SLOPE="positive"> > $ nc node03 8649 | grep ProcessName_DataNode | head -n1 > $ nc node05 8649 | grep ProcessName_DataNode | head -n1 > $ nc node01 8649 | grep ProcessName_DataNode | wc -l > 100 > $ nc node03 8649 | grep ProcessName_DataNode | wc -l > 0 > $ nc node05 8649 | grep ProcessName_DataNode | wc -l > 0 > > We could see that only first collector node from the list has certain > metrics. > > Hadoop version we use: > - MapReduce 2.0.0-mr1-cdh4.1.1 > - HDFS 2.0.0-cdh4.1.1 > > hadoop-metrics2.properties content: > > datanode.period=20 > > datanode.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31 > datanode.sink.ganglia.servers=192.168.1.111:8649,192.168.1.113:8649, > 192.168.1.115:8649 > datanode.sink.ganglia.tagsForPrefix.jvm=* > datanode.sink.ganglia.tagsForPrefix.dfs=* > datanode.sink.ganglia.tagsForPrefix.rpc=* > datanode.sink.ganglia.tagsForPrefix.rpcdetailed=* > datanode.sink.ganglia.tagsForPrefix.metricssystem=* > > -- > Best Regards > Ivan Tretyakov > -- Best Regards Ivan Tretyakov --001a11c2645681faa204ebc4042c Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
We investigated the problem and found root cause. Metrics2= framework uses different from first version config parser (Metrics2 uses a= pache-commons, Metrics uses hadoop's).=A0=A0org.apache.hadoop.metrics2.= sink.ganglia.AbstractGangliaSink=A0uses commas as separators by default. So= when we provide list of servers it returns everything until first separato= r - it is only first server from the list.=A0
But we were able to find workaround. Class parsing servers list=A0(org.apac= he.hadoop.metrics2.util.Servers) handles only commas and spaces. It means i= f we will provide space separated list of servers instead of comma separate= d then new parser will be able to read whole servers list. After that all s= ervers will be registered as metrics receivers and metrics will be sent to = all of them.


On Thu, Jan 1= 7, 2013 at 7:17 PM, Ivan Tretyakov <itretyakov@griddynamics.com<= /a>> wrote:
Hi!

We ha= ve following problem.

But for example datanode (using org.apache.hadoop.metrics2.sink.gangli= a.GangliaSink31) sends one metrics to first target host and the another to = the second and third.=A0
So some metrics missed on second and thi= rd node. When gmetad collects metrics from one of these we could not see ce= rtain metrics in ganglia.=A0

E.g. on node07 running only one process which sends met= rics to ganglia - datanode process and we could see following using tcpdump= .=A0

Dumping traffic for about three minutes:=A0
$ sudo -i tcpdump dst port 8649 and src host node07 | tee tcpdump.out= =A0
...
$ head -n1 tcpdump.out=A0
12:18:05.55= 9719 IP node07.dom.local.43350 > node01.dom.local.8649: UDP, length 180= =A0
$ tail -n1 tcpdump.out=A0
12:20:59.575144 IP node=A0

Then count packets and bytes sent to each target:=A0
$ grep node01 tcpdump.out | wc -l=A0
5972=A0
$ gr= ep node03 tcpdump.out | wc -l=A0
3812=A0
$ grep node05 tcpdump.out | wc -l=A0
3811= =A0
$ grep node01 tcpdump.out | awk 'BEGIN{sum=3D0}{sum=3Dsum= +$8}END{print sum}'=A0
1048272=A0
$ grep node03 tcp= dump.out | awk 'BEGIN{sum=3D0}{sum=3Dsum+$8}END{print sum}'=A0
731604=A0
$ grep node05 tcpdump.out | awk 'BEGIN{sum=3D0= }{sum=3Dsum+$8}END{print sum}'=A0
731532=A0
=A0=A0<= /div>
Also we could request gmond daemons which metrics do they have:= =A0

$ nc node01 8649 | grep ProcessName_DataNode | head -n1=A0
<METRIC NAME=3D"jvm.JvmMetrics.ProcessName_DataNode.LogFat= al" VAL=3D"0" TYPE=3D"float" UNITS=3D"" = TN=3D"0" TMAX=3D"60" DMAX=3D"0" SLOPE=3D"= ;positive">=A0
$ nc node03 8649 | grep ProcessName_DataNode | head -n1=A0
$= nc node05 8649 | grep ProcessName_DataNode | head -n1=A0
$ nc no= de01 8649 | grep ProcessName_DataNode | wc -l=A0
100=A0
$ nc node03 8649 | grep ProcessName_DataNode | wc -l=A0
0=A0
$ nc node05 8649 | grep ProcessName_DataNode | wc -l=A0=
0=A0

We could see that only first colle= ctor node from the list has certain metrics.

Hadoo= p version we use:
- MapReduce 2.0.0-mr1-cdh4.1.1
- HDFS 2.0.0-cdh4.1.1

hadoop-metrics2.properties content:

<= div>datanode.period=3D20
datanode.sink.ganglia.class=3Dorg.apache= .hadoop.metrics2.sink.ganglia.GangliaSink31
datanode.sink.ganglia.tagsForPrefix.jvm=3D*
datanode.sink.ganglia= .tagsForPrefix.dfs=3D*
datanode.sink.ganglia.tagsForPrefix.rpc=3D= *
datanode.sink.ganglia.tagsForPrefix.rpcdetailed=3D*
d= atanode.sink.ganglia.tagsForPrefix.metricssystem=3D*

--=A0
Best Regards
Ivan Tretyakov



--
Best Regards
Ivan Tretyakov

--001a11c2645681faa204ebc4042c--