hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Merto Mertek <masmer...@gmail.com>
Subject Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?
Date Sat, 11 Feb 2012 00:19:43 GMT
Varun unfortunately I have had some problems with deploying a new version
on the cluster.. Hadoop is not picking the new build in lib folder despite
a classpath is set to it. The new build is picked just if I put it in the
$HD_HOME/share/hadoop/, which is very strange.. I've done this on all nodes
and can access the web, but all tasktracker are being stopped because of an
error:

INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Cleanup...
> java.lang.InterruptedException: sleep interrupted
>     at java.lang.Thread.sleep(Native Method)
>     at
> org.apache.hadoop.filecache.TrackerDistributedCacheManager$CleanupThread.run(TrackerDistributedCacheManager.java:926)
>


Probably the error is the consequence of an inadequate deploy of a jar.. I
will ask to the dev list how they do it or are you maybe having any other
idea?



On 10 February 2012 17:10, Varun Kapoor <reznor@hortonworks.com> wrote:

> Hey Merto,
>
> Any luck getting the patch running on your cluster?
>
> In case you're interested, there's now a JIRA for this:
> https://issues.apache.org/jira/browse/HADOOP-8052.
>
> Varun
>
> On Wed, Feb 8, 2012 at 7:45 PM, Varun Kapoor <reznor@hortonworks.com>
> wrote:
>
> > Your general procedure sounds correct (i.e. dropping your newly built
> .jar
> > into $HD_HOME/lib/), but to make sure it's getting picked up, you should
> > explicitly add $HD_HOME/lib/ to your exported HADOOP_CLASSPATH
> environment
> > variable; here's mine, as an example:
> >
> > export HADOOP_CLASSPATH=".:./build/*.jar"
> >
> > About your second point, you certainly need to copy this newly patched
> > .jar to every node in your cluster, because my patch changes the value
> of a
> > couple metrics emitted TO gmetad (FROM all the nodes in the cluster), so
> > without copying it over to every node in the cluster, gmetad will still
> > likely receive some bad metrics.
> >
> > Varun
> >
> >
> > On Wed, Feb 8, 2012 at 6:19 PM, Merto Mertek <masmertoz@gmail.com>
> wrote:
> >
> >> I will need your help. Please confirm if the following procedure is
> right.
> >> I have a dev environment where I pimp my scheduler (no hadoop running)
> and
> >> a small cluster environment where the changes(jars) are deployed with
> some
> >> scripts,  however I have never compiled the whole hadoop from source so
> I
> >> do not know if I am doing it right. I' ve done it as follow:
> >>
> >> a) apply a patch
> >> b) cd $HD_HOME; ant
> >> c) copy $HD_HOME/*build*/patched-core-hadoop.jar ->
> >> cluster:/$HD_HOME/*lib*
> >> d) run $HD_HOME/bin/start-all.sh
> >>
> >> Is this enough? When I tried to test "hadoop dfs -ls /" I could see
> that a
> >> new jar was not loaded and instead a jar from
> >> $HD_HOME/*share*/hadoop-20.205.0.jar
> >> was taken..
> >> Should I copy the entire hadoop folder to all nodes and reconfigure the
> >> entire cluster for the new build, or is enough if I configure it just on
> >> the node where gmetad will run?
> >>
> >>
> >>
> >>
> >>
> >>
> >> On 8 February 2012 06:33, Varun Kapoor <reznor@hortonworks.com> wrote:
> >>
> >> > I'm so sorry, Merto - like a silly goose, I attached the 2 patches to
> my
> >> > reply, and of course the mailing list did not accept the attachment.
> >> >
> >> > I plan on opening JIRAs for this tomorrow, but till then, here are
> >> links to
> >> > the 2 patches (from my Dropbox account):
> >> >
> >> >   - http://dl.dropbox.com/u/4366344/gmetadBufferOverflow.Hadoop.patch
> >> >   - http://dl.dropbox.com/u/4366344/gmetadBufferOverflow.gmetad.patch
> >> >
> >> > Here's hoping this works for you,
> >> >
> >> > Varun
> >> > On Tue, Feb 7, 2012 at 6:00 PM, Merto Mertek <masmertoz@gmail.com>
> >> wrote:
> >> >
> >> > > Varun, have I missed your link to the patches? I have tried to
> search
> >> > them
> >> > > on jira but I did not find them.. Can you repost the link for these
> >> two
> >> > > patches?
> >> > >
> >> > > Thank you..
> >> > >
> >> > > On 7 February 2012 20:36, Varun Kapoor <reznor@hortonworks.com>
> >> wrote:
> >> > >
> >> > > > I'm sorry to hear that gmetad cores continuously for you guys.
> Since
> >> > I'm
> >> > > > not seeing that behavior, I'm going to just put out the 2 possible
> >> > > patches
> >> > > > you could apply and wait to hear back from you. :)
> >> > > >
> >> > > > Option 1
> >> > > >
> >> > > > * Apply gmetadBufferOverflow.Hadoop.patch to the relevant file
(
> >> > > >
> >> > >
> >> >
> >>
> http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/core/org/apache/hadoop/metrics2/util/SampleStat.java?view=markupinmysetup
> )
> >> in your Hadoop sources and rebuild Hadoop.
> >> > > >
> >> > > > Option 2
> >> > > >
> >> > > > * Apply gmetadBufferOverflow.gmetad.patch to gmetad/process_xml.c
> >> and
> >> > > > rebuild gmetad.
> >> > > >
> >> > > > Only 1 of these 2 fixes is required, and it would help me if
you
> >> could
> >> > > > first try Option 1 and let me know if that fixes things for you.
> >> > > >
> >> > > > Varun
> >> > > >
> >> > > > On Mon, Feb 6, 2012 at 10:36 PM, mete <efkarr@gmail.com>
wrote:
> >> > > >
> >> > > >> Same with Merto's situation here, it always overflows short
time
> >> after
> >> > > the
> >> > > >> restart. Without the hadoop metrics enabled everything is
smooth.
> >> > > >> Regards
> >> > > >>
> >> > > >> Mete
> >> > > >>
> >> > > >> On Tue, Feb 7, 2012 at 4:58 AM, Merto Mertek <
> masmertoz@gmail.com>
> >> > > wrote:
> >> > > >>
> >> > > >> > I have tried to run it but it repeats crashing..
> >> > > >> >
> >> > > >> >  - When you start gmetad and Hadoop is not emitting
metrics,
> >> > > everything
> >> > > >> > >   is peachy.
> >> > > >> > >
> >> > > >> >
> >> > > >> > Right, running just ganglia without running hadoop jobs
seems
> >> stable
> >> > > >> for at
> >> > > >> > least a day..
> >> > > >> >
> >> > > >> >
> >> > > >> > >   - When you start Hadoop (and it thus starts emitting
> >> metrics),
> >> > > >> gmetad
> >> > > >> > >   cores.
> >> > > >> > >
> >> > > >> >
> >> > > >> > True, with a  following error : *** stack smashing detected
> ***:
> >> > > gmetad
> >> > > >> > terminated \n Segmentation fault
> >> > > >> >
> >> > > >> >     - On my MacBookPro, it's a SIGABRT due to a buffer
> overflow.
> >> > > >> > >
> >> > > >> > > I believe this is happening for everyone. What
I would like
> for
> >> > you
> >> > > to
> >> > > >> > try
> >> > > >> > > out are the following 2 scenarios:
> >> > > >> > >
> >> > > >> > >   - Once gmetad cores, if you start it up again,
does it core
> >> > again?
> >> > > >> Does
> >> > > >> > >   this process repeat ad infinitum?
> >> > > >> > >
> >> > > >> >     - On my MBP, the core is a one-time thing, and restarting
> >> gmetad
> >> > > >> > >      after the first core makes things run perfectly
> smoothly.
> >> > > >> > >         - I know others are saying this core occurs
> >> continuously,
> >> > > but
> >> > > >> > they
> >> > > >> > >         were all using ganglia-3.1.x, and I'm interested
in
> how
> >> > > >> > > ganglia-3.2.0
> >> > > >> > >         behaves for you.
> >> > > >> > >
> >> > > >> >
> >> > > >> > It cores everytime I run it. The difference is just
that
> >> sometimes a
> >> > > >> > segmentation faults appears instantly, and sometimes
it appears
> >> > after
> >> > > a
> >> > > >> > random time...lets say after a minute of running gmetad
and
> >> > collecting
> >> > > >> > data.
> >> > > >> >
> >> > > >> >
> >> > > >> > >         - If you start Hadoop first (so gmetad
is not running
> >> when
> >> > > the
> >> > > >> > >   first batch of Hadoop metrics are emitted) and
THEN start
> >> gmetad
> >> > > >> after
> >> > > >> > a
> >> > > >> > >   few seconds, do you still see gmetad coring?
> >> > > >> > >
> >> > > >> >
> >> > > >> > Yes
> >> > > >> >
> >> > > >> >
> >> > > >> > >      - On my MBP, this sequence works perfectly
fine, and
> there
> >> > are
> >> > > no
> >> > > >> > >      gmetad cores whatsoever.
> >> > > >> > >
> >> > > >> >
> >> > > >> > I have tested this scenario with 2 working nodes so
two gmond
> >> plus
> >> > the
> >> > > >> head
> >> > > >> > gmond on the server where gmetad is located. I have
checked and
> >> all
> >> > of
> >> > > >> them
> >> > > >> > are versioned 3.2.0.
> >> > > >> >
> >> > > >> > Hope it helps..
> >> > > >> >
> >> > > >> >
> >> > > >> >
> >> > > >> > >
> >> > > >> > > Bear in mind that this only addresses the gmetad
coring
> issue -
> >> > the
> >> > > >> > > warnings emitted about '4.9E-324' being out of
range will
> >> > continue,
> >> > > >> but I
> >> > > >> > > know what's causing that as well (and hope that
my patch
> fixes
> >> it
> >> > > for
> >> > > >> > > free).
> >> > > >> > >
> >> > > >> > > Varun
> >> > > >> > > On Mon, Feb 6, 2012 at 2:39 PM, Merto Mertek <
> >> masmertoz@gmail.com
> >> > >
> >> > > >> > wrote:
> >> > > >> > >
> >> > > >> > > > Yes I am encoutering the same problems and
like Mete said
> >>  few
> >> > > >> seconds
> >> > > >> > > > after restarting a segmentation fault appears..
here is my
> >> > conf..
> >> > > >> > > > <http://pastebin.com/VgBjp08d>
> >> > > >> > > >
> >> > > >> > > > And here are some info from /var/log/messages
(ubuntu
> server
> >> > > 10.10):
> >> > > >> > > >
> >> > > >> > > > kernel: [424447.140641] gmetad[26115] general
protection
> >> > > >> > ip:7f7762428fdb
> >> > > >> > > > > sp:7f776362d370 error:0 in
> >> libgcc_s.so.1[7f776241a000+15000]
> >> > > >> > > > >
> >> > > >> > > >
> >> > > >> > > > When I compiled gmetad I used the following
command:
> >> > > >> > > >
> >> > > >> > > > ./configure --with-gmetad --sysconfdir=/etc/ganglia
> >> > > >> > > > > CPPFLAGS="-I/usr/local/rrdtool-1.4.7/include"
> >> > > >> > > > > CFLAGS="-I/usr/local/rrdtool-1.4.7/include"
> >> > > >> > > > > LDFLAGS="-L/usr/local/rrdtool-1.4.7/lib"
> >> > > >> > > > >
> >> > > >> > > >
> >> > > >> > > > The same was tried with rrdtool 1.4.5. My
current ganglia
> >> > version
> >> > > is
> >> > > >> > > 3.2.0
> >> > > >> > > > and like Mete I tried it with version 3.1.7
but without
> >> > success..
> >> > > >> > > >
> >> > > >> > > > Hope we will sort it out soon any solution..
> >> > > >> > > > thank you
> >> > > >> > > >
> >> > > >> > > >
> >> > > >> > > > On 6 February 2012 20:09, mete <efkarr@gmail.com>
wrote:
> >> > > >> > > >
> >> > > >> > > > > Hello,
> >> > > >> > > > > i also face this issue when using GangliaContext31
and
> >> > > >> hadoop-1.0.0,
> >> > > >> > > and
> >> > > >> > > > > ganglia 3.1.7 (also tried 3.1.2). I continuously
get
> buffer
> >> > > >> overflows
> >> > > >> > > as
> >> > > >> > > > > soon as i restart the gmetad.
> >> > > >> > > > > Regards
> >> > > >> > > > > Mete
> >> > > >> > > > >
> >> > > >> > > > > On Mon, Feb 6, 2012 at 7:42 PM, Vitthal
"Suhas" Gogate <
> >> > > >> > > > > gogate@hortonworks.com> wrote:
> >> > > >> > > > >
> >> > > >> > > > > > I assume you have seen the following
information on
> >> Hadoop
> >> > > >> twiki,
> >> > > >> > > > > > http://wiki.apache.org/hadoop/GangliaMetrics
> >> > > >> > > > > >
> >> > > >> > > > > > So do you use GangliaContext31 in
> >> > hadoop-metrics2.properties?
> >> > > >> > > > > >
> >> > > >> > > > > > We use Ganglia 3.2 with Hadoop 20.205
 and works fine
> (I
> >> > > >> remember
> >> > > >> > > > seeing
> >> > > >> > > > > > gmetad sometime goes down due to
buffer overflow
> problem
> >> > when
> >> > > >> > hadoop
> >> > > >> > > > > starts
> >> > > >> > > > > > pumping in the metrics.. but restarting
works.. let me
> >> know
> >> > if
> >> > > >> you
> >> > > >> > > face
> >> > > >> > > > > > same problem?
> >> > > >> > > > > >
> >> > > >> > > > > > --Suhas
> >> > > >> > > > > >
> >> > > >> > > > > > Additionally, the Ganglia protocol
change significantly
> >> > > between
> >> > > >> > > Ganglia
> >> > > >> > > > > 3.0
> >> > > >> > > > > > and Ganglia 3.1 (i.e., Ganglia 3.1
is not compatible
> with
> >> > > >> Ganglia
> >> > > >> > 3.0
> >> > > >> > > > > > clients). This caused Hadoop to
not work with Ganglia
> >> 3.1;
> >> > > there
> >> > > >> > is a
> >> > > >> > > > > patch
> >> > > >> > > > > > available for this, HADOOP-4675.
As of November 2010,
> >> this
> >> > > patch
> >> > > >> > has
> >> > > >> > > > been
> >> > > >> > > > > > rolled into the mainline for 0.20.2
and later. To use
> the
> >> > > >> Ganglia
> >> > > >> > 3.1
> >> > > >> > > > > > protocol in place of the 3.0, substitute
> >> > > >> > > > > > org.apache.hadoop.metrics.ganglia.GangliaContext31
for
> >> > > >> > > > > > org.apache.hadoop.metrics.ganglia.GangliaContext
in the
> >> > > >> > > > > > hadoop-metrics.properties lines
above.
> >> > > >> > > > > >
> >> > > >> > > > > > On Fri, Feb 3, 2012 at 1:07 PM,
Merto Mertek <
> >> > > >> masmertoz@gmail.com>
> >> > > >> > > > > wrote:
> >> > > >> > > > > >
> >> > > >> > > > > > > I spent a lot of time to figure
it out however i did
> >> not
> >> > > find
> >> > > >> a
> >> > > >> > > > > solution.
> >> > > >> > > > > > > Problems from the logs pointed
me for some bugs in
> >> > rrdupdate
> >> > > >> > tool,
> >> > > >> > > > > > however
> >> > > >> > > > > > > i tried to solve it with different
versions of
> ganglia
> >> and
> >> > > >> > rrdtool
> >> > > >> > > > but
> >> > > >> > > > > > the
> >> > > >> > > > > > > error is the same. Segmentation
fault appears after
> the
> >> > > >> following
> >> > > >> > > > > lines,
> >> > > >> > > > > > if
> >> > > >> > > > > > > I run gmetad in debug mode...
> >> > > >> > > > > > >
> >> > > >> > > > > > > "Created rrd
> >> > > >> > > > > > >
> >> > > >> > > > > > >
> >> > > >> > > > > >
> >> > > >> > > > >
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> /var/lib/ganglia/rrds/hdcluster/xxx/metricssystem.MetricsSystem.publish_max_time.rrd"
> >> > > >> > > > > > > "Created rrd
> >> > > >> > > > > > >
> >> > > >> > > > > > >
> >> > > >> > > > > >
> >> > > >> > > > >
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> /var/lib/ganglia/rrds/hdcluster/xxx/metricssystem.MetricsSystem.snapshot_max_time.rrd
> >> > > >> > > > > > > "
> >> > > >> > > > > > >
> >> > > >> > > > > > > which I suppose are generated
from
> >> MetricsSystemImpl.java
> >> > > (Is
> >> > > >> > there
> >> > > >> > > > any
> >> > > >> > > > > > way
> >> > > >> > > > > > > just to disable this two metrics?)
> >> > > >> > > > > > >
> >> > > >> > > > > > > From the /var/log/messages
there are a lot of errors:
> >> > > >> > > > > > >
> >> > > >> > > > > > > "xxx gmetad[15217]: RRD_update
> >> > > >> > > > > > >
> >> > > >> > > > > > >
> >> > > >> > > > > >
> >> > > >> > > > >
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> (/var/lib/ganglia/rrds/hdc/xxx/metricssystem.MetricsSystem.publish_imax_time.rrd):
> >> > > >> > > > > > > converting  '4.9E-324' to float:
Numerical result out
> >> of
> >> > > >> range"
> >> > > >> > > > > > > "xxx gmetad[15217]: RRD_update
> >> > > >> > > > > > >
> >> > > >> > > > > > >
> >> > > >> > > > > >
> >> > > >> > > > >
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> (/var/lib/ganglia/rrds/hdc/xxx/metricssystem.MetricsSystem.snapshot_imax_time.rrd):
> >> > > >> > > > > > > converting  '4.9E-324' to float:
Numerical result out
> >> of
> >> > > >> range"
> >> > > >> > > > > > >
> >> > > >> > > > > > > so probably there are some
converting issues ? Where
> >> > should
> >> > > I
> >> > > >> > look
> >> > > >> > > > for
> >> > > >> > > > > > the
> >> > > >> > > > > > > solution? Would you rather
suggest to use ganglia
> 3.0.x
> >> > with
> >> > > >> the
> >> > > >> > > old
> >> > > >> > > > > > > protocol and leave the version
>3.1 for further
> >> releases?
> >> > > >> > > > > > >
> >> > > >> > > > > > > any help is realy appreciated...
> >> > > >> > > > > > >
> >> > > >> > > > > > > On 1 February 2012 04:04, Merto
Mertek <
> >> > masmertoz@gmail.com
> >> > > >
> >> > > >> > > wrote:
> >> > > >> > > > > > >
> >> > > >> > > > > > > > I would be glad to hear
that too.. I've setup the
> >> > > following:
> >> > > >> > > > > > > >
> >> > > >> > > > > > > > Hadoop 0.20.205
> >> > > >> > > > > > > > Ganglia Front  3.1.7
> >> > > >> > > > > > > > Ganglia Back *(gmetad)*
3.1.7
> >> > > >> > > > > > > > RRDTool <http://www.rrdtool.org/>
1.4.5. -> i had
> >> some
> >> > > >> > troubles
> >> > > >> > > > > > > > installing 1.4.4
> >> > > >> > > > > > > >
> >> > > >> > > > > > > > Ganglia works just in
case hadoop is not running,
> so
> >> > > metrics
> >> > > >> > are
> >> > > >> > > > not
> >> > > >> > > > > > > > publshed to gmetad node
(conf with new
> >> > > >> > > > hadoop-metrics2.proprieties).
> >> > > >> > > > > > When
> >> > > >> > > > > > > > hadoop is started, a segmentation
fault appears in
> >> > gmetad
> >> > > >> > deamon:
> >> > > >> > > > > > > >
> >> > > >> > > > > > > > sudo gmetad -d 2
> >> > > >> > > > > > > > .......
> >> > > >> > > > > > > > Updating host xxx, metric
> >> dfs.FSNamesystem.BlocksTotal
> >> > > >> > > > > > > > Updating host xxx, metric
bytes_in
> >> > > >> > > > > > > > Updating host xxx, metric
bytes_out
> >> > > >> > > > > > > > Updating host xxx, metric
> >> > > >> > > > > metricssystem.MetricsSystem.publish_max_time
> >> > > >> > > > > > > > Created rrd
> >> > > >> > > > > > > >
> >> > > >> > > > > > >
> >> > > >> > > > > >
> >> > > >> > > > >
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> /var/lib/ganglia/rrds/hdcluster/hadoopmaster/metricssystem.MetricsSystem.publish_max_time.rrd
> >> > > >> > > > > > > > Segmentation fault
> >> > > >> > > > > > > >
> >> > > >> > > > > > > > And some info from the
apache log <
> >> > > >> > http://pastebin.com/nrqKRtKJ
> >> > > >> > > >..
> >> > > >> > > > > > > >
> >> > > >> > > > > > > > Can someone suggest a
ganglia version that is
> tested
> >> > with
> >> > > >> > hadoop
> >> > > >> > > > > > > 0.20.205?
> >> > > >> > > > > > > > I will try to sort it
out however it seems a not so
> >> > > tribial
> >> > > >> > > > problem..
> >> > > >> > > > > > > >
> >> > > >> > > > > > > > Thank you
> >> > > >> > > > > > > >
> >> > > >> > > > > > > >
> >> > > >> > > > > > > >
> >> > > >> > > > > > > >
> >> > > >> > > > > > > >
> >> > > >> > > > > > > > On 2 December 2011 12:32,
praveenesh kumar <
> >> > > >> > praveenesh@gmail.com
> >> > > >> > > >
> >> > > >> > > > > > wrote:
> >> > > >> > > > > > > >
> >> > > >> > > > > > > >> or Do I have to apply
some hadoop patch for this ?
> >> > > >> > > > > > > >>
> >> > > >> > > > > > > >> Thanks,
> >> > > >> > > > > > > >> Praveenesh
> >> > > >> > > > > > > >>
> >> > > >> > > > > > > >
> >> > > >> > > > > > > >
> >> > > >> > > > > > >
> >> > > >> > > > > >
> >> > > >> > > > >
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > >
> >> > > >
> >> > > > http://www.hadoopsummit.org/
> >> > > >
> >> > > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> >
> >> >
> >> > http://www.hadoopsummit.org/
> >> >
> >>
> >
> >
> >
> > --
> >
> >
> > http://www.hadoopsummit.org/
> >
> >
>
>
> --
>
>
> http://www.hadoopsummit.org/
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message