hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Merto Mertek <masmer...@gmail.com>
Subject Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?
Date Tue, 07 Feb 2012 02:58:25 GMT
I have tried to run it but it repeats crashing..

  - When you start gmetad and Hadoop is not emitting metrics, everything
>   is peachy.
>

Right, running just ganglia without running hadoop jobs seems stable for at
least a day..


>   - When you start Hadoop (and it thus starts emitting metrics), gmetad
>   cores.
>

True, with a  following error : *** stack smashing detected ***: gmetad
terminated \n Segmentation fault

     - On my MacBookPro, it's a SIGABRT due to a buffer overflow.
>
> I believe this is happening for everyone. What I would like for you to try
> out are the following 2 scenarios:
>
>   - Once gmetad cores, if you start it up again, does it core again? Does
>   this process repeat ad infinitum?
>
     - On my MBP, the core is a one-time thing, and restarting gmetad
>      after the first core makes things run perfectly smoothly.
>         - I know others are saying this core occurs continuously, but they
>         were all using ganglia-3.1.x, and I'm interested in how
> ganglia-3.2.0
>         behaves for you.
>

It cores everytime I run it. The difference is just that sometimes a
segmentation faults appears instantly, and sometimes it appears after a
random time...lets say after a minute of running gmetad and collecting data.


>         - If you start Hadoop first (so gmetad is not running when the
>   first batch of Hadoop metrics are emitted) and THEN start gmetad after a
>   few seconds, do you still see gmetad coring?
>

Yes


>      - On my MBP, this sequence works perfectly fine, and there are no
>      gmetad cores whatsoever.
>

I have tested this scenario with 2 working nodes so two gmond plus the head
gmond on the server where gmetad is located. I have checked and all of them
are versioned 3.2.0.

Hope it helps..



>
> Bear in mind that this only addresses the gmetad coring issue - the
> warnings emitted about '4.9E-324' being out of range will continue, but I
> know what's causing that as well (and hope that my patch fixes it for
> free).
>
> Varun
> On Mon, Feb 6, 2012 at 2:39 PM, Merto Mertek <masmertoz@gmail.com> wrote:
>
> > Yes I am encoutering the same problems and like Mete said  few seconds
> > after restarting a segmentation fault appears.. here is my conf..
> > <http://pastebin.com/VgBjp08d>
> >
> > And here are some info from /var/log/messages (ubuntu server 10.10):
> >
> > kernel: [424447.140641] gmetad[26115] general protection ip:7f7762428fdb
> > > sp:7f776362d370 error:0 in libgcc_s.so.1[7f776241a000+15000]
> > >
> >
> > When I compiled gmetad I used the following command:
> >
> > ./configure --with-gmetad --sysconfdir=/etc/ganglia
> > > CPPFLAGS="-I/usr/local/rrdtool-1.4.7/include"
> > > CFLAGS="-I/usr/local/rrdtool-1.4.7/include"
> > > LDFLAGS="-L/usr/local/rrdtool-1.4.7/lib"
> > >
> >
> > The same was tried with rrdtool 1.4.5. My current ganglia version is
> 3.2.0
> > and like Mete I tried it with version 3.1.7 but without success..
> >
> > Hope we will sort it out soon any solution..
> > thank you
> >
> >
> > On 6 February 2012 20:09, mete <efkarr@gmail.com> wrote:
> >
> > > Hello,
> > > i also face this issue when using GangliaContext31 and hadoop-1.0.0,
> and
> > > ganglia 3.1.7 (also tried 3.1.2). I continuously get buffer overflows
> as
> > > soon as i restart the gmetad.
> > > Regards
> > > Mete
> > >
> > > On Mon, Feb 6, 2012 at 7:42 PM, Vitthal "Suhas" Gogate <
> > > gogate@hortonworks.com> wrote:
> > >
> > > > I assume you have seen the following information on Hadoop twiki,
> > > > http://wiki.apache.org/hadoop/GangliaMetrics
> > > >
> > > > So do you use GangliaContext31 in hadoop-metrics2.properties?
> > > >
> > > > We use Ganglia 3.2 with Hadoop 20.205  and works fine (I remember
> > seeing
> > > > gmetad sometime goes down due to buffer overflow problem when hadoop
> > > starts
> > > > pumping in the metrics.. but restarting works.. let me know if you
> face
> > > > same problem?
> > > >
> > > > --Suhas
> > > >
> > > > Additionally, the Ganglia protocol change significantly between
> Ganglia
> > > 3.0
> > > > and Ganglia 3.1 (i.e., Ganglia 3.1 is not compatible with Ganglia 3.0
> > > > clients). This caused Hadoop to not work with Ganglia 3.1; there is a
> > > patch
> > > > available for this, HADOOP-4675. As of November 2010, this patch has
> > been
> > > > rolled into the mainline for 0.20.2 and later. To use the Ganglia 3.1
> > > > protocol in place of the 3.0, substitute
> > > > org.apache.hadoop.metrics.ganglia.GangliaContext31 for
> > > > org.apache.hadoop.metrics.ganglia.GangliaContext in the
> > > > hadoop-metrics.properties lines above.
> > > >
> > > > On Fri, Feb 3, 2012 at 1:07 PM, Merto Mertek <masmertoz@gmail.com>
> > > wrote:
> > > >
> > > > > I spent a lot of time to figure it out however i did not find a
> > > solution.
> > > > > Problems from the logs pointed me for some bugs in rrdupdate tool,
> > > > however
> > > > > i tried to solve it with different versions of ganglia and rrdtool
> > but
> > > > the
> > > > > error is the same. Segmentation fault appears after the following
> > > lines,
> > > > if
> > > > > I run gmetad in debug mode...
> > > > >
> > > > > "Created rrd
> > > > >
> > > > >
> > > >
> > >
> >
> /var/lib/ganglia/rrds/hdcluster/xxx/metricssystem.MetricsSystem.publish_max_time.rrd"
> > > > > "Created rrd
> > > > >
> > > > >
> > > >
> > >
> >
> /var/lib/ganglia/rrds/hdcluster/xxx/metricssystem.MetricsSystem.snapshot_max_time.rrd
> > > > > "
> > > > >
> > > > > which I suppose are generated from MetricsSystemImpl.java (Is there
> > any
> > > > way
> > > > > just to disable this two metrics?)
> > > > >
> > > > > From the /var/log/messages there are a lot of errors:
> > > > >
> > > > > "xxx gmetad[15217]: RRD_update
> > > > >
> > > > >
> > > >
> > >
> >
> (/var/lib/ganglia/rrds/hdc/xxx/metricssystem.MetricsSystem.publish_imax_time.rrd):
> > > > > converting  '4.9E-324' to float: Numerical result out of range"
> > > > > "xxx gmetad[15217]: RRD_update
> > > > >
> > > > >
> > > >
> > >
> >
> (/var/lib/ganglia/rrds/hdc/xxx/metricssystem.MetricsSystem.snapshot_imax_time.rrd):
> > > > > converting  '4.9E-324' to float: Numerical result out of range"
> > > > >
> > > > > so probably there are some converting issues ? Where should I look
> > for
> > > > the
> > > > > solution? Would you rather suggest to use ganglia 3.0.x with the
> old
> > > > > protocol and leave the version >3.1 for further releases?
> > > > >
> > > > > any help is realy appreciated...
> > > > >
> > > > > On 1 February 2012 04:04, Merto Mertek <masmertoz@gmail.com>
> wrote:
> > > > >
> > > > > > I would be glad to hear that too.. I've setup the following:
> > > > > >
> > > > > > Hadoop 0.20.205
> > > > > > Ganglia Front  3.1.7
> > > > > > Ganglia Back *(gmetad)* 3.1.7
> > > > > > RRDTool <http://www.rrdtool.org/> 1.4.5. -> i had some
troubles
> > > > > > installing 1.4.4
> > > > > >
> > > > > > Ganglia works just in case hadoop is not running, so metrics
are
> > not
> > > > > > publshed to gmetad node (conf with new
> > hadoop-metrics2.proprieties).
> > > > When
> > > > > > hadoop is started, a segmentation fault appears in gmetad deamon:
> > > > > >
> > > > > > sudo gmetad -d 2
> > > > > > .......
> > > > > > Updating host xxx, metric dfs.FSNamesystem.BlocksTotal
> > > > > > Updating host xxx, metric bytes_in
> > > > > > Updating host xxx, metric bytes_out
> > > > > > Updating host xxx, metric
> > > metricssystem.MetricsSystem.publish_max_time
> > > > > > Created rrd
> > > > > >
> > > > >
> > > >
> > >
> >
> /var/lib/ganglia/rrds/hdcluster/hadoopmaster/metricssystem.MetricsSystem.publish_max_time.rrd
> > > > > > Segmentation fault
> > > > > >
> > > > > > And some info from the apache log <http://pastebin.com/nrqKRtKJ
> >..
> > > > > >
> > > > > > Can someone suggest a ganglia version that is tested with hadoop
> > > > > 0.20.205?
> > > > > > I will try to sort it out however it seems a not so tribial
> > problem..
> > > > > >
> > > > > > Thank you
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On 2 December 2011 12:32, praveenesh kumar <praveenesh@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > >> or Do I have to apply some hadoop patch for this ?
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Praveenesh
> > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message