Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 91B6A9108 for ; Wed, 15 Feb 2012 17:00:04 +0000 (UTC) Received: (qmail 87460 invoked by uid 500); 15 Feb 2012 17:00:00 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 87409 invoked by uid 500); 15 Feb 2012 17:00:00 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 87401 invoked by uid 99); 15 Feb 2012 17:00:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Feb 2012 17:00:00 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.220.176] (HELO mail-vx0-f176.google.com) (209.85.220.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Feb 2012 16:59:56 +0000 Received: by vcbfl13 with SMTP id fl13so1371811vcb.35 for ; Wed, 15 Feb 2012 08:59:35 -0800 (PST) MIME-Version: 1.0 Received: by 10.52.24.70 with SMTP id s6mr11482883vdf.47.1329325174834; Wed, 15 Feb 2012 08:59:34 -0800 (PST) Received: by 10.52.91.141 with HTTP; Wed, 15 Feb 2012 08:59:34 -0800 (PST) In-Reply-To: References: Date: Wed, 15 Feb 2012 08:59:34 -0800 Message-ID: Subject: Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ? From: Varun Kapoor To: common-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=20cf3071cdacd645c204b903a2ca X-Gm-Message-State: ALoCoQmMwEXJNdHKPlgKYj/S+QpzStgXN7386oeVulW5To138uJsH/U15/f+/VwJEUYZ1+t65pQf --20cf3071cdacd645c204b903a2ca Content-Type: text/plain; charset=ISO-8859-1 The warnings about underflow are totally expected (they come from strtod(), and they will no longer occur with Hadoop-1.0.1, which applies my patch from HADOOP-8052), so that's not worrisome. As for the buffer overflow, do you think you could show me a backtrace of this core? If you can't find the core file on disk, just start gmetad under gdb, like so: $ sudo gdb (gdb) r --conf= ... ::Wait for crash:: (gdb) bt (gdb) info locals If you're familiar with gdb, then I'd appreciate any additional diagnosis you could perform (for example, to figure out which metric's value caused this buffer overflow) - if you're not, I'll try and send you some gdb scripts to narrow things down once I see the output from this round of debugging. Also, out of curiosity, is patching Hadoop not an option for you? Or is it just that rebuilding (and redeploying) ganglia is the lesser of the 2 evils? :) Varun On Tue, Feb 14, 2012 at 11:43 PM, mete wrote: > Hello Varun, > i have patched and recompiled ganglia from source bit it still cores after > the patch. > > Here are some logs: > Feb 15 09:39:14 master gmetad[16487]: RRD_update > > (/var/lib/ganglia/rrds/hadoop/slave4/metricssystem.MetricsSystem.publish_max_time.rrd): > > /var/lib/ganglia/rrds/hadoop/slave4/metricssystem.MetricsSystem.publish_max_time.rrd: > converting '4.9E-324' to float: Numerical result out of range > Feb 15 09:39:14 master gmetad[16487]: RRD_update > > (/var/lib/ganglia/rrds/hadoop/master/metricssystem.MetricsSystem.publish_imax_time.rrd): > > /var/lib/ganglia/rrds/hadoop/master/metricssystem.MetricsSystem.publish_imax_time.rrd: > converting '4.9E-324' to float: Numerical result out of range > Feb 15 09:39:14 master gmetad[16487]: RRD_update > > (/var/lib/ganglia/rrds/hadoop/slave1/metricssystem.MetricsSystem.publish_imax_time.rrd): > > /var/lib/ganglia/rrds/hadoop/slave1/metricssystem.MetricsSystem.publish_imax_time.rrd: > converting '4.9E-324' to float: Numerical result out of range > Feb 15 09:39:14 master gmetad[16487]: RRD_update > > (/var/lib/ganglia/rrds/hadoop/slave1/metricssystem.MetricsSystem.snapshot_imax_time.rrd): > > /var/lib/ganglia/rrds/hadoop/slave1/metricssystem.MetricsSystem.snapshot_imax_time.rrd: > converting '4.9E-324' to float: Numerical result out of range > Feb 15 09:39:14 master gmetad[16487]: RRD_update > > (/var/lib/ganglia/rrds/hadoop/slave1/metricssystem.MetricsSystem.publish_max_time.rrd): > > /var/lib/ganglia/rrds/hadoop/slave1/metricssystem.MetricsSystem.publish_max_time.rrd: > converting '4.9E-324' to float: Numerical result out of range > Feb 15 09:39:14 master gmetad[16487]: *** buffer overflow detected ***: > gmetad terminated > > i am using hadoop.1.0.0 and ganglia 3.20 tarball. > > Cheers > Mete > > On Sat, Feb 11, 2012 at 2:19 AM, Merto Mertek wrote: > > > Varun unfortunately I have had some problems with deploying a new version > > on the cluster.. Hadoop is not picking the new build in lib folder > despite > > a classpath is set to it. The new build is picked just if I put it in the > > $HD_HOME/share/hadoop/, which is very strange.. I've done this on all > nodes > > and can access the web, but all tasktracker are being stopped because of > an > > error: > > > > INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: > Cleanup... > > > java.lang.InterruptedException: sleep interrupted > > > at java.lang.Thread.sleep(Native Method) > > > at > > > > > > org.apache.hadoop.filecache.TrackerDistributedCacheManager$CleanupThread.run(TrackerDistributedCacheManager.java:926) > > > > > > > > > Probably the error is the consequence of an inadequate deploy of a jar.. > I > > will ask to the dev list how they do it or are you maybe having any other > > idea? > > > > > > > > On 10 February 2012 17:10, Varun Kapoor wrote: > > > > > Hey Merto, > > > > > > Any luck getting the patch running on your cluster? > > > > > > In case you're interested, there's now a JIRA for this: > > > https://issues.apache.org/jira/browse/HADOOP-8052. > > > > > > Varun > > > > > > On Wed, Feb 8, 2012 at 7:45 PM, Varun Kapoor > > > wrote: > > > > > > > Your general procedure sounds correct (i.e. dropping your newly built > > > .jar > > > > into $HD_HOME/lib/), but to make sure it's getting picked up, you > > should > > > > explicitly add $HD_HOME/lib/ to your exported HADOOP_CLASSPATH > > > environment > > > > variable; here's mine, as an example: > > > > > > > > export HADOOP_CLASSPATH=".:./build/*.jar" > > > > > > > > About your second point, you certainly need to copy this newly > patched > > > > .jar to every node in your cluster, because my patch changes the > value > > > of a > > > > couple metrics emitted TO gmetad (FROM all the nodes in the cluster), > > so > > > > without copying it over to every node in the cluster, gmetad will > still > > > > likely receive some bad metrics. > > > > > > > > Varun > > > > > > > > > > > > On Wed, Feb 8, 2012 at 6:19 PM, Merto Mertek > > > wrote: > > > > > > > >> I will need your help. Please confirm if the following procedure is > > > right. > > > >> I have a dev environment where I pimp my scheduler (no hadoop > running) > > > and > > > >> a small cluster environment where the changes(jars) are deployed > with > > > some > > > >> scripts, however I have never compiled the whole hadoop from source > > so > > > I > > > >> do not know if I am doing it right. I' ve done it as follow: > > > >> > > > >> a) apply a patch > > > >> b) cd $HD_HOME; ant > > > >> c) copy $HD_HOME/*build*/patched-core-hadoop.jar -> > > > >> cluster:/$HD_HOME/*lib* > > > >> d) run $HD_HOME/bin/start-all.sh > > > >> > > > >> Is this enough? When I tried to test "hadoop dfs -ls /" I could see > > > that a > > > >> new jar was not loaded and instead a jar from > > > >> $HD_HOME/*share*/hadoop-20.205.0.jar > > > >> was taken.. > > > >> Should I copy the entire hadoop folder to all nodes and reconfigure > > the > > > >> entire cluster for the new build, or is enough if I configure it > just > > on > > > >> the node where gmetad will run? > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> On 8 February 2012 06:33, Varun Kapoor > > wrote: > > > >> > > > >> > I'm so sorry, Merto - like a silly goose, I attached the 2 patches > > to > > > my > > > >> > reply, and of course the mailing list did not accept the > attachment. > > > >> > > > > >> > I plan on opening JIRAs for this tomorrow, but till then, here are > > > >> links to > > > >> > the 2 patches (from my Dropbox account): > > > >> > > > > >> > - > > http://dl.dropbox.com/u/4366344/gmetadBufferOverflow.Hadoop.patch > > > >> > - > > http://dl.dropbox.com/u/4366344/gmetadBufferOverflow.gmetad.patch > > > >> > > > > >> > Here's hoping this works for you, > > > >> > > > > >> > Varun > > > >> > On Tue, Feb 7, 2012 at 6:00 PM, Merto Mertek > > > > >> wrote: > > > >> > > > > >> > > Varun, have I missed your link to the patches? I have tried to > > > search > > > >> > them > > > >> > > on jira but I did not find them.. Can you repost the link for > > these > > > >> two > > > >> > > patches? > > > >> > > > > > >> > > Thank you.. > > > >> > > > > > >> > > On 7 February 2012 20:36, Varun Kapoor > > > >> wrote: > > > >> > > > > > >> > > > I'm sorry to hear that gmetad cores continuously for you guys. > > > Since > > > >> > I'm > > > >> > > > not seeing that behavior, I'm going to just put out the 2 > > possible > > > >> > > patches > > > >> > > > you could apply and wait to hear back from you. :) > > > >> > > > > > > >> > > > Option 1 > > > >> > > > > > > >> > > > * Apply gmetadBufferOverflow.Hadoop.patch to the relevant > file ( > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/core/org/apache/hadoop/metrics2/util/SampleStat.java?view=markupinmysetup > > > ) > > > >> in your Hadoop sources and rebuild Hadoop. > > > >> > > > > > > >> > > > Option 2 > > > >> > > > > > > >> > > > * Apply gmetadBufferOverflow.gmetad.patch to > > gmetad/process_xml.c > > > >> and > > > >> > > > rebuild gmetad. > > > >> > > > > > > >> > > > Only 1 of these 2 fixes is required, and it would help me if > you > > > >> could > > > >> > > > first try Option 1 and let me know if that fixes things for > you. > > > >> > > > > > > >> > > > Varun > > > >> > > > > > > >> > > > On Mon, Feb 6, 2012 at 10:36 PM, mete > wrote: > > > >> > > > > > > >> > > >> Same with Merto's situation here, it always overflows short > > time > > > >> after > > > >> > > the > > > >> > > >> restart. Without the hadoop metrics enabled everything is > > smooth. > > > >> > > >> Regards > > > >> > > >> > > > >> > > >> Mete > > > >> > > >> > > > >> > > >> On Tue, Feb 7, 2012 at 4:58 AM, Merto Mertek < > > > masmertoz@gmail.com> > > > >> > > wrote: > > > >> > > >> > > > >> > > >> > I have tried to run it but it repeats crashing.. > > > >> > > >> > > > > >> > > >> > - When you start gmetad and Hadoop is not emitting > metrics, > > > >> > > everything > > > >> > > >> > > is peachy. > > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > Right, running just ganglia without running hadoop jobs > seems > > > >> stable > > > >> > > >> for at > > > >> > > >> > least a day.. > > > >> > > >> > > > > >> > > >> > > > > >> > > >> > > - When you start Hadoop (and it thus starts emitting > > > >> metrics), > > > >> > > >> gmetad > > > >> > > >> > > cores. > > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > True, with a following error : *** stack smashing detected > > > ***: > > > >> > > gmetad > > > >> > > >> > terminated \n Segmentation fault > > > >> > > >> > > > > >> > > >> > - On my MacBookPro, it's a SIGABRT due to a buffer > > > overflow. > > > >> > > >> > > > > > >> > > >> > > I believe this is happening for everyone. What I would > like > > > for > > > >> > you > > > >> > > to > > > >> > > >> > try > > > >> > > >> > > out are the following 2 scenarios: > > > >> > > >> > > > > > >> > > >> > > - Once gmetad cores, if you start it up again, does it > > core > > > >> > again? > > > >> > > >> Does > > > >> > > >> > > this process repeat ad infinitum? > > > >> > > >> > > > > > >> > > >> > - On my MBP, the core is a one-time thing, and > restarting > > > >> gmetad > > > >> > > >> > > after the first core makes things run perfectly > > > smoothly. > > > >> > > >> > > - I know others are saying this core occurs > > > >> continuously, > > > >> > > but > > > >> > > >> > they > > > >> > > >> > > were all using ganglia-3.1.x, and I'm interested > in > > > how > > > >> > > >> > > ganglia-3.2.0 > > > >> > > >> > > behaves for you. > > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > It cores everytime I run it. The difference is just that > > > >> sometimes a > > > >> > > >> > segmentation faults appears instantly, and sometimes it > > appears > > > >> > after > > > >> > > a > > > >> > > >> > random time...lets say after a minute of running gmetad and > > > >> > collecting > > > >> > > >> > data. > > > >> > > >> > > > > >> > > >> > > > > >> > > >> > > - If you start Hadoop first (so gmetad is not > > running > > > >> when > > > >> > > the > > > >> > > >> > > first batch of Hadoop metrics are emitted) and THEN > start > > > >> gmetad > > > >> > > >> after > > > >> > > >> > a > > > >> > > >> > > few seconds, do you still see gmetad coring? > > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > Yes > > > >> > > >> > > > > >> > > >> > > > > >> > > >> > > - On my MBP, this sequence works perfectly fine, and > > > there > > > >> > are > > > >> > > no > > > >> > > >> > > gmetad cores whatsoever. > > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > I have tested this scenario with 2 working nodes so two > gmond > > > >> plus > > > >> > the > > > >> > > >> head > > > >> > > >> > gmond on the server where gmetad is located. I have checked > > and > > > >> all > > > >> > of > > > >> > > >> them > > > >> > > >> > are versioned 3.2.0. > > > >> > > >> > > > > >> > > >> > Hope it helps.. > > > >> > > >> > > > > >> > > >> > > > > >> > > >> > > > > >> > > >> > > > > > >> > > >> > > Bear in mind that this only addresses the gmetad coring > > > issue - > > > >> > the > > > >> > > >> > > warnings emitted about '4.9E-324' being out of range will > > > >> > continue, > > > >> > > >> but I > > > >> > > >> > > know what's causing that as well (and hope that my patch > > > fixes > > > >> it > > > >> > > for > > > >> > > >> > > free). > > > >> > > >> > > > > > >> > > >> > > Varun > > > >> > > >> > > On Mon, Feb 6, 2012 at 2:39 PM, Merto Mertek < > > > >> masmertoz@gmail.com > > > >> > > > > > >> > > >> > wrote: > > > >> > > >> > > > > > >> > > >> > > > Yes I am encoutering the same problems and like Mete > said > > > >> few > > > >> > > >> seconds > > > >> > > >> > > > after restarting a segmentation fault appears.. here is > > my > > > >> > conf.. > > > >> > > >> > > > > > > >> > > >> > > > > > > >> > > >> > > > And here are some info from /var/log/messages (ubuntu > > > server > > > >> > > 10.10): > > > >> > > >> > > > > > > >> > > >> > > > kernel: [424447.140641] gmetad[26115] general > protection > > > >> > > >> > ip:7f7762428fdb > > > >> > > >> > > > > sp:7f776362d370 error:0 in > > > >> libgcc_s.so.1[7f776241a000+15000] > > > >> > > >> > > > > > > > >> > > >> > > > > > > >> > > >> > > > When I compiled gmetad I used the following command: > > > >> > > >> > > > > > > >> > > >> > > > ./configure --with-gmetad --sysconfdir=/etc/ganglia > > > >> > > >> > > > > CPPFLAGS="-I/usr/local/rrdtool-1.4.7/include" > > > >> > > >> > > > > CFLAGS="-I/usr/local/rrdtool-1.4.7/include" > > > >> > > >> > > > > LDFLAGS="-L/usr/local/rrdtool-1.4.7/lib" > > > >> > > >> > > > > > > > >> > > >> > > > > > > >> > > >> > > > The same was tried with rrdtool 1.4.5. My current > ganglia > > > >> > version > > > >> > > is > > > >> > > >> > > 3.2.0 > > > >> > > >> > > > and like Mete I tried it with version 3.1.7 but without > > > >> > success.. > > > >> > > >> > > > > > > >> > > >> > > > Hope we will sort it out soon any solution.. > > > >> > > >> > > > thank you > > > >> > > >> > > > > > > >> > > >> > > > > > > >> > > >> > > > On 6 February 2012 20:09, mete > wrote: > > > >> > > >> > > > > > > >> > > >> > > > > Hello, > > > >> > > >> > > > > i also face this issue when using GangliaContext31 > and > > > >> > > >> hadoop-1.0.0, > > > >> > > >> > > and > > > >> > > >> > > > > ganglia 3.1.7 (also tried 3.1.2). I continuously get > > > buffer > > > >> > > >> overflows > > > >> > > >> > > as > > > >> > > >> > > > > soon as i restart the gmetad. > > > >> > > >> > > > > Regards > > > >> > > >> > > > > Mete > > > >> > > >> > > > > > > > >> > > >> > > > > On Mon, Feb 6, 2012 at 7:42 PM, Vitthal "Suhas" > Gogate > > < > > > >> > > >> > > > > gogate@hortonworks.com> wrote: > > > >> > > >> > > > > > > > >> > > >> > > > > > I assume you have seen the following information on > > > >> Hadoop > > > >> > > >> twiki, > > > >> > > >> > > > > > http://wiki.apache.org/hadoop/GangliaMetrics > > > >> > > >> > > > > > > > > >> > > >> > > > > > So do you use GangliaContext31 in > > > >> > hadoop-metrics2.properties? > > > >> > > >> > > > > > > > > >> > > >> > > > > > We use Ganglia 3.2 with Hadoop 20.205 and works > fine > > > (I > > > >> > > >> remember > > > >> > > >> > > > seeing > > > >> > > >> > > > > > gmetad sometime goes down due to buffer overflow > > > problem > > > >> > when > > > >> > > >> > hadoop > > > >> > > >> > > > > starts > > > >> > > >> > > > > > pumping in the metrics.. but restarting works.. let > > me > > > >> know > > > >> > if > > > >> > > >> you > > > >> > > >> > > face > > > >> > > >> > > > > > same problem? > > > >> > > >> > > > > > > > > >> > > >> > > > > > --Suhas > > > >> > > >> > > > > > > > > >> > > >> > > > > > Additionally, the Ganglia protocol change > > significantly > > > >> > > between > > > >> > > >> > > Ganglia > > > >> > > >> > > > > 3.0 > > > >> > > >> > > > > > and Ganglia 3.1 (i.e., Ganglia 3.1 is not > compatible > > > with > > > >> > > >> Ganglia > > > >> > > >> > 3.0 > > > >> > > >> > > > > > clients). This caused Hadoop to not work with > Ganglia > > > >> 3.1; > > > >> > > there > > > >> > > >> > is a > > > >> > > >> > > > > patch > > > >> > > >> > > > > > available for this, HADOOP-4675. As of November > 2010, > > > >> this > > > >> > > patch > > > >> > > >> > has > > > >> > > >> > > > been > > > >> > > >> > > > > > rolled into the mainline for 0.20.2 and later. To > use > > > the > > > >> > > >> Ganglia > > > >> > > >> > 3.1 > > > >> > > >> > > > > > protocol in place of the 3.0, substitute > > > >> > > >> > > > > > org.apache.hadoop.metrics.ganglia.GangliaContext31 > > for > > > >> > > >> > > > > > org.apache.hadoop.metrics.ganglia.GangliaContext in > > the > > > >> > > >> > > > > > hadoop-metrics.properties lines above. > > > >> > > >> > > > > > > > > >> > > >> > > > > > On Fri, Feb 3, 2012 at 1:07 PM, Merto Mertek < > > > >> > > >> masmertoz@gmail.com> > > > >> > > >> > > > > wrote: > > > >> > > >> > > > > > > > > >> > > >> > > > > > > I spent a lot of time to figure it out however i > > did > > > >> not > > > >> > > find > > > >> > > >> a > > > >> > > >> > > > > solution. > > > >> > > >> > > > > > > Problems from the logs pointed me for some bugs > in > > > >> > rrdupdate > > > >> > > >> > tool, > > > >> > > >> > > > > > however > > > >> > > >> > > > > > > i tried to solve it with different versions of > > > ganglia > > > >> and > > > >> > > >> > rrdtool > > > >> > > >> > > > but > > > >> > > >> > > > > > the > > > >> > > >> > > > > > > error is the same. Segmentation fault appears > after > > > the > > > >> > > >> following > > > >> > > >> > > > > lines, > > > >> > > >> > > > > > if > > > >> > > >> > > > > > > I run gmetad in debug mode... > > > >> > > >> > > > > > > > > > >> > > >> > > > > > > "Created rrd > > > >> > > >> > > > > > > > > > >> > > >> > > > > > > > > > >> > > >> > > > > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > > > > >> > > > > >> > > > > > > /var/lib/ganglia/rrds/hdcluster/xxx/metricssystem.MetricsSystem.publish_max_time.rrd" > > > >> > > >> > > > > > > "Created rrd > > > >> > > >> > > > > > > > > > >> > > >> > > > > > > > > > >> > > >> > > > > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > > > > >> > > > > >> > > > > > > /var/lib/ganglia/rrds/hdcluster/xxx/metricssystem.MetricsSystem.snapshot_max_time.rrd > > > >> > > >> > > > > > > " > > > >> > > >> > > > > > > > > > >> > > >> > > > > > > which I suppose are generated from > > > >> MetricsSystemImpl.java > > > >> > > (Is > > > >> > > >> > there > > > >> > > >> > > > any > > > >> > > >> > > > > > way > > > >> > > >> > > > > > > just to disable this two metrics?) > > > >> > > >> > > > > > > > > > >> > > >> > > > > > > From the /var/log/messages there are a lot of > > errors: > > > >> > > >> > > > > > > > > > >> > > >> > > > > > > "xxx gmetad[15217]: RRD_update > > > >> > > >> > > > > > > > > > >> > > >> > > > > > > > > > >> > > >> > > > > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > > > > >> > > > > >> > > > > > > (/var/lib/ganglia/rrds/hdc/xxx/metricssystem.MetricsSystem.publish_imax_time.rrd): > > > >> > > >> > > > > > > converting '4.9E-324' to float: Numerical result > > out > > > >> of > > > >> > > >> range" > > > >> > > >> > > > > > > "xxx gmetad[15217]: RRD_update > > > >> > > >> > > > > > > > > > >> > > >> > > > > > > > > > >> > > >> > > > > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > > > > >> > > > > >> > > > > > > (/var/lib/ganglia/rrds/hdc/xxx/metricssystem.MetricsSystem.snapshot_imax_time.rrd): > > > >> > > >> > > > > > > converting '4.9E-324' to float: Numerical result > > out > > > >> of > > > >> > > >> range" > > > >> > > >> > > > > > > > > > >> > > >> > > > > > > so probably there are some converting issues ? > > Where > > > >> > should > > > >> > > I > > > >> > > >> > look > > > >> > > >> > > > for > > > >> > > >> > > > > > the > > > >> > > >> > > > > > > solution? Would you rather suggest to use ganglia > > > 3.0.x > > > >> > with > > > >> > > >> the > > > >> > > >> > > old > > > >> > > >> > > > > > > protocol and leave the version >3.1 for further > > > >> releases? > > > >> > > >> > > > > > > > > > >> > > >> > > > > > > any help is realy appreciated... > > > >> > > >> > > > > > > > > > >> > > >> > > > > > > On 1 February 2012 04:04, Merto Mertek < > > > >> > masmertoz@gmail.com > > > >> > > > > > > >> > > >> > > wrote: > > > >> > > >> > > > > > > > > > >> > > >> > > > > > > > I would be glad to hear that too.. I've setup > the > > > >> > > following: > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > > > Hadoop 0.20.205 > > > >> > > >> > > > > > > > Ganglia Front 3.1.7 > > > >> > > >> > > > > > > > Ganglia Back *(gmetad)* 3.1.7 > > > >> > > >> > > > > > > > RRDTool 1.4.5. -> i > > had > > > >> some > > > >> > > >> > troubles > > > >> > > >> > > > > > > > installing 1.4.4 > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > > > Ganglia works just in case hadoop is not > running, > > > so > > > >> > > metrics > > > >> > > >> > are > > > >> > > >> > > > not > > > >> > > >> > > > > > > > publshed to gmetad node (conf with new > > > >> > > >> > > > hadoop-metrics2.proprieties). > > > >> > > >> > > > > > When > > > >> > > >> > > > > > > > hadoop is started, a segmentation fault appears > > in > > > >> > gmetad > > > >> > > >> > deamon: > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > > > sudo gmetad -d 2 > > > >> > > >> > > > > > > > ....... > > > >> > > >> > > > > > > > Updating host xxx, metric > > > >> dfs.FSNamesystem.BlocksTotal > > > >> > > >> > > > > > > > Updating host xxx, metric bytes_in > > > >> > > >> > > > > > > > Updating host xxx, metric bytes_out > > > >> > > >> > > > > > > > Updating host xxx, metric > > > >> > > >> > > > > metricssystem.MetricsSystem.publish_max_time > > > >> > > >> > > > > > > > Created rrd > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > > > > > >> > > >> > > > > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > > > > >> > > > > >> > > > > > > /var/lib/ganglia/rrds/hdcluster/hadoopmaster/metricssystem.MetricsSystem.publish_max_time.rrd > > > >> > > >> > > > > > > > Segmentation fault > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > > > And some info from the apache log < > > > >> > > >> > http://pastebin.com/nrqKRtKJ > > > >> > > >> > > >.. > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > > > Can someone suggest a ganglia version that is > > > tested > > > >> > with > > > >> > > >> > hadoop > > > >> > > >> > > > > > > 0.20.205? > > > >> > > >> > > > > > > > I will try to sort it out however it seems a > not > > so > > > >> > > tribial > > > >> > > >> > > > problem.. > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > > > Thank you > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > > > On 2 December 2011 12:32, praveenesh kumar < > > > >> > > >> > praveenesh@gmail.com > > > >> > > >> > > > > > > >> > > >> > > > > > wrote: > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > > >> or Do I have to apply some hadoop patch for > > this ? > > > >> > > >> > > > > > > >> > > > >> > > >> > > > > > > >> Thanks, > > > >> > > >> > > > > > > >> Praveenesh > > > >> > > >> > > > > > > >> > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > > > > > >> > > >> > > > > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > -- > > > >> > > > > > > >> > > > > > > >> > > > http://www.hadoopsummit.org/ > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > >> > > > > >> > -- > > > >> > > > > >> > > > > >> > http://www.hadoopsummit.org/ > > > >> > > > > >> > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > http://www.hadoopsummit.org/ > > > > > > > > > > > > > > > > > -- > > > > > > > > > http://www.hadoopsummit.org/ > > > > > > -- http://www.hadoopsummit.org/ --20cf3071cdacd645c204b903a2ca--