hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vitthal \"Suhas\" Gogate" <gog...@hortonworks.com>
Subject Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?
Date Mon, 06 Feb 2012 17:42:11 GMT
I assume you have seen the following information on Hadoop twiki,
http://wiki.apache.org/hadoop/GangliaMetrics

So do you use GangliaContext31 in hadoop-metrics2.properties?

We use Ganglia 3.2 with Hadoop 20.205  and works fine (I remember seeing
gmetad sometime goes down due to buffer overflow problem when hadoop starts
pumping in the metrics.. but restarting works.. let me know if you face
same problem?

--Suhas

Additionally, the Ganglia protocol change significantly between Ganglia 3.0
and Ganglia 3.1 (i.e., Ganglia 3.1 is not compatible with Ganglia 3.0
clients). This caused Hadoop to not work with Ganglia 3.1; there is a patch
available for this, HADOOP-4675. As of November 2010, this patch has been
rolled into the mainline for 0.20.2 and later. To use the Ganglia 3.1
protocol in place of the 3.0, substitute
org.apache.hadoop.metrics.ganglia.GangliaContext31 for
org.apache.hadoop.metrics.ganglia.GangliaContext in the
hadoop-metrics.properties lines above.

On Fri, Feb 3, 2012 at 1:07 PM, Merto Mertek <masmertoz@gmail.com> wrote:

> I spent a lot of time to figure it out however i did not find a solution.
> Problems from the logs pointed me for some bugs in rrdupdate tool, however
> i tried to solve it with different versions of ganglia and rrdtool but the
> error is the same. Segmentation fault appears after the following lines, if
> I run gmetad in debug mode...
>
> "Created rrd
>
> /var/lib/ganglia/rrds/hdcluster/xxx/metricssystem.MetricsSystem.publish_max_time.rrd"
> "Created rrd
>
> /var/lib/ganglia/rrds/hdcluster/xxx/metricssystem.MetricsSystem.snapshot_max_time.rrd
> "
>
> which I suppose are generated from MetricsSystemImpl.java (Is there any way
> just to disable this two metrics?)
>
> From the /var/log/messages there are a lot of errors:
>
> "xxx gmetad[15217]: RRD_update
>
> (/var/lib/ganglia/rrds/hdc/xxx/metricssystem.MetricsSystem.publish_imax_time.rrd):
> converting  '4.9E-324' to float: Numerical result out of range"
> "xxx gmetad[15217]: RRD_update
>
> (/var/lib/ganglia/rrds/hdc/xxx/metricssystem.MetricsSystem.snapshot_imax_time.rrd):
> converting  '4.9E-324' to float: Numerical result out of range"
>
> so probably there are some converting issues ? Where should I look for the
> solution? Would you rather suggest to use ganglia 3.0.x with the old
> protocol and leave the version >3.1 for further releases?
>
> any help is realy appreciated...
>
> On 1 February 2012 04:04, Merto Mertek <masmertoz@gmail.com> wrote:
>
> > I would be glad to hear that too.. I've setup the following:
> >
> > Hadoop 0.20.205
> > Ganglia Front  3.1.7
> > Ganglia Back *(gmetad)* 3.1.7
> > RRDTool <http://www.rrdtool.org/> 1.4.5. -> i had some troubles
> > installing 1.4.4
> >
> > Ganglia works just in case hadoop is not running, so metrics are not
> > publshed to gmetad node (conf with new hadoop-metrics2.proprieties). When
> > hadoop is started, a segmentation fault appears in gmetad deamon:
> >
> > sudo gmetad -d 2
> > .......
> > Updating host xxx, metric dfs.FSNamesystem.BlocksTotal
> > Updating host xxx, metric bytes_in
> > Updating host xxx, metric bytes_out
> > Updating host xxx, metric metricssystem.MetricsSystem.publish_max_time
> > Created rrd
> >
> /var/lib/ganglia/rrds/hdcluster/hadoopmaster/metricssystem.MetricsSystem.publish_max_time.rrd
> > Segmentation fault
> >
> > And some info from the apache log <http://pastebin.com/nrqKRtKJ>..
> >
> > Can someone suggest a ganglia version that is tested with hadoop
> 0.20.205?
> > I will try to sort it out however it seems a not so tribial problem..
> >
> > Thank you
> >
> >
> >
> >
> >
> > On 2 December 2011 12:32, praveenesh kumar <praveenesh@gmail.com> wrote:
> >
> >> or Do I have to apply some hadoop patch for this ?
> >>
> >> Thanks,
> >> Praveenesh
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message