hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Kimball <aa...@cloudera.com>
Subject Re: Advice on restarting HDFS in a cron
Date Sun, 26 Apr 2009 05:07:56 GMT
If your logs were being written to the root partition (/dev/sda1), that's
going to fill up fast. This partition is always <= 10 GB on EC2 and much of
that space is consumed by the OS install. You should redirect your logs to
some place under /mnt (/dev/sdb1); that's 160 GB.

- Aaron

On Sun, Apr 26, 2009 at 3:21 AM, Rakhi Khatwani <rakhi.khatwani@gmail.com>wrote:

> Hi,
>   I have faced somewhat a similar issue...
>   i have a couple of map reduce jobs running on EC2... after a week or so,
> i get a no space on device exception while performing any linux command...
> so end up shuttin down hadoop and hbase, clear the logs and then restart
> them.
>
> is there a cleaner way to do it???
>
> thanks
> Raakhi
>
> On Fri, Apr 24, 2009 at 11:59 PM, Todd Lipcon <todd@cloudera.com> wrote:
>
> > On Fri, Apr 24, 2009 at 11:18 AM, Marc Limotte <mlimotte@feeva.com>
> wrote:
> >
> > > Actually, I'm concerned about performance of map/reduce jobs for a
> > > long-running cluster.  I.e. it seems to get slower the longer it's
> > running.
> > >  After a restart of HDFS, the jobs seems to run faster.  Not concerned
> > about
> > > the start-up time of HDFS.
> > >
> >
> > Hi Marc,
> >
> > Does it sound like this JIRA describes your problem?
> >
> > https://issues.apache.org/jira/browse/HADOOP-4766
> >
> > If so, restarting just the JT should help with the symptoms. (I say
> > symptoms
> > because this is clearly a problem! Hadoop should be stable and performant
> > for months without a cluster restart!)
> >
> > -Todd
> >
> >
> > >
> > > Of course, as you suggest, this could be poor configuration of the
> > cluster
> > > on my part; but I'd still like to hear best practices around doing a
> > > scheduled restart.
> > >
> > > Marc
> > >
> > > -----Original Message-----
> > > From: Allen Wittenauer [mailto:aw@yahoo-inc.com]
> > > Sent: Friday, April 24, 2009 10:17 AM
> > > To: core-user@hadoop.apache.org
> > > Subject: Re: Advice on restarting HDFS in a cron
> > >
> > >
> > >
> > >
> > > On 4/24/09 9:31 AM, "Marc Limotte" <mlimotte@feeva.com> wrote:
> > > > I've heard that HDFS starts to slow down after it's been running for
> a
> > > long
> > > > time.  And I believe I've experienced this.
> > >
> > > We did an upgrade (== complete restart) of a 2000 node instance in ~20
> > > minutes on Wednesday. I wouldn't really consider that 'slow', but YMMV.
> > >
> > > I suspect people aren't running the secondary name node and therefore
> > have
> > > massively large edits file.  The name node appears slow on restart
> > because
> > > it has to apply the edits to the fsimage rather than having the
> secondary
> > > keep it up to date.
> > >
> > >
> > > -----Original Message-----
> > > From: Marc Limotte
> > >
> > > Hi.
> > >
> > > I've heard that HDFS starts to slow down after it's been running for a
> > long
> > > time.  And I believe I've experienced this.   So, I was thinking to set
> > up a
> > > cron job to execute every week to shutdown HDFS and start it up again.
> > >
> > > In concept, it would be something like:
> > >
> > > 0 0 0 0 0 $HADOOP_HOME/bin/stop-dfs.sh; $HADOOP_HOME/bin/start-dfs.sh
> > >
> > > But I'm wondering if there is a safer way to do this.  In particular:
> > >
> > > *         What if a map/reduce job is running when this cron hits.  Is
> > > there a way to suspend jobs while the HDFS restart happens?
> > >
> > > *         Should I also restart the mapred daemons?
> > >
> > > *         Should I wait some time after "stop-dfs.sh" for things to
> > settle
> > > down, before executing "start-dfs.sh"?  Or maybe I should run a command
> > to
> > > verify that it is stopped before I run the start?
> > >
> > > Thanks for any help.
> > > Marc
> > >
> > >
> > > PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT
> FOR
> > > ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A
> > COMMUNICATION
> > > PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW,
> USE,
> > > DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY
> > > PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL
> > AND
> > > PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message