incubator-flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bao Thai Ngo <baothai...@gmail.com>
Subject Re: Flume Master Issues
Date Sun, 28 Aug 2011 14:36:05 GMT
Mike,

I had the same problem with flume master. Try to remove flume and its init
script at Master machine, then re-install flume master again. Just remember
to save your configuration first.

Good luck.

~Thai

On Fri, Aug 26, 2011 at 10:55 PM, Mike <miketheman@gmail.com> wrote:

> I'd also ensure that all nodes/masters/collectors/etc are using the
> precise same build of flume.
>
> On Fri, Aug 26, 2011 at 11:53 AM, Matthew Rathbone
> <matthew@foursquare.com> wrote:
> > Ah, I'm seeing this on single-master mode :-/. Anywhere else you think I
> > could look for useful debugging output?
> > --
> > Matthew Rathbone
> > Foursquare | Software Engineer | Server Engineering Team
> > matthew@foursquare.com | @rathboma | 4sq
> >
> > On Friday, August 26, 2011 at 10:34 AM, Mike wrote:
> >
> > I did - but that was when we were testing multi-master mode, and since
> > it's not fully matured yet, I've gone back to a single master.
> >
> > On Fri, Aug 26, 2011 at 11:32 AM, Matthew Rathbone
> > <matthew@foursquare.com> wrote:
> >
> > You're right, there's another pid file there, that's crazy.
> > Have you experienced the unresponsiveness thing too?
> > --
> > Matthew Rathbone
> > Foursquare | Software Engineer | Server Engineering Team
> > matthew@foursquare.com | @rathboma | 4sq
> >
> > On Friday, August 26, 2011 at 10:17 AM, Mike wrote:
> >
> > I recall a similar problem I had with this.
> >
> > It ended up being another pid-style file dropped somewhere else.
> >
> > /var/run/flume/flume-flume-master.pid
> > /tmp/flumemaster.pid
> >
> > See if those are still around once all the flume procs are dead.
> >
> > -M
> >
> > On Fri, Aug 26, 2011 at 11:03 AM, Matthew Rathbone
> > <matthew@foursquare.com> wrote:
> >
> > Hey all,
> > We're having totally unpredictable issues with the flume master
> installation
> > lately, here's what happened to us last night / today:
> > YESTERDAY
> > Yesterday we added 8 new nodes to flume. They got set-up fine, and the
> > configs were registered.
> > a few hours later the master totally stops responding to anything
> > (web/shell/nodes), I don't find out until this morning.
> > TODAY
> > I try to stop it using the init script, that doesn't do anything, and it
> > continues to run, but be unresponsive
> > I kill -9 the flume processes, and remove the pid file, figuring I can
> just
> > start it again
> > now the master won't start "master already running on
> > pid=<non-existent-pid>"
> > when I finally get it to start (changing the pid directory), it starts
> being
> > unresponsive again
> > restart it, it does the same
> > stop all flume-nodes, restart it, looks good, start the flume nodes, it
> goes
> > unresponsive again
> > restart it, and this time it works
> >
> > The only log above an INFO statement that I can see is this:
> > 2011-08-26 14:38:34,527 WARN com.cloudera.flume.agent.FlumeNode: Unable
> to
> > load output format plugin class  - Class not found
> > but I don't think that's causing the issues.
> >
> > I do have a flume-node running on the same machine, could there be some
> sort
> > of race condition happening?
> > Has anyone else seen behavior like this?
> > Any idea how to fix it?
> > Hoping someone can shed some light on this, I'm really not sure what's
> going
> > on.
> > Thanks all
> > --
> > Matthew Rathbone
> > Foursquare | Software Engineer | Server Engineering Team
> > matthew@foursquare.com | @rathboma | 4sq
> >
> >
>

Mime
View raw message