Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hbase-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of jlist@streamy.com designates
 72.34.249.3 as permitted sender)
From: "Jonathan Gray" <jlist@streamy.com>
To: <hbase-user@hadoop.apache.org>
References: <eb4706e0812011826wc495982kfe2b4837705a0ca4@mail.gmail.com>
	 <823889.72529.qm@web65510.mail.ac4.yahoo.com>
 <eb4706e0812011904w2d81f978yaba3603bd6dd7541@mail.gmail.com>
In-Reply-To: <eb4706e0812011904w2d81f978yaba3603bd6dd7541@mail.gmail.com>
Subject: RE: Bulk import question.
Date: Mon, 1 Dec 2008 19:20:14 -0800
Message-ID: <005301c9542c$e45f39f0$ad1dadd0$@com>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Thread-Index: AclUKsGWbanxImySSk6nFxrhOc8SoQAAXJpg
Content-Language: en-us

Your new best friends:  Ganglia and Nagios

Ganglia is great for monitoring cluster-wide resource usage over time.  =
You'll see memory, cpu, disk, network usage over time for entire cluster =
and for each node.  It is very easy to setup because it uses UDP =
broadcast so no need to actually configure nodes in conf files.  HBase =
0.19 introduces ganglia metrics which will also be available in the =
ganglia web interface.

http://ganglia.info/

Nagios is good for monitoring services as well as resource utilization.  =
Rather than give data over time, it's aim is really to alert you when =
something is wrong.  For example, when a server is no longer reachable =
or when available disk space reaches a configurable threshold.  It does =
require a bit more work to get up and running because you have to setup =
your node and service configurations.  I have written custom nagios =
plugins for hadoop and hbase, if there's interest I will look at =
cleaning them up and contrib'ing them.

http://www.nagios.org/

Both are free and essential tools for properly monitoring your cluster.

JG

> -----Original Message-----
> From: edward@udanax.org [mailto:edward@udanax.org] On Behalf Of Edward
> J. Yoon
> Sent: Monday, December 01, 2008 7:04 PM
> To: apurtell@apache.org
> Cc: hbase-user@hadoop.apache.org; 02635@nhncorp.com
> Subject: Re: Bulk import question.
>=20
> I'm considering to store the large-scale web-mail data on the Hbase.
> As you know, there is a lot of mail bomb (e.g. spam, group mail,...,
> etc). So, I tested these.
>=20
> Here's my additionally question. Have we a monitoring tool for disk
> space?
>=20
> /Edward
>=20
> On Tue, Dec 2, 2008 at 11:42 AM, Andrew Purtell <apurtell@apache.org>
> wrote:
> > Edward,
> >
> > You are running with insufficient resources -- too little CPU
> > for your task and too little disk for your data.
> >
> > If you are running a mapreduce task and DFS runs out of space
> > for the temporary files, then you indeed should expect
> > aberrant job status from the Hadoop job framework, for
> > example such things as completion status running backwards.
> >
> > I do agree that under these circumstances HBase daemons
> > should fail more gracefully, by entering some kind of
> > degraded read only mode, if DFS is not totally dead. I
> > suspect this is already on a to do list somewhere, and I
> > vaguely recall a jira filed on that topic.
> >
> >   - Andy
> >
> >
> >> From: Edward J. Yoon <edwardyoon@apache.org>
> >> Subject: Re: Bulk import question.
> >> To: hbase-user@hadoop.apache.org, apurtell@apache.org
> >> Date: Monday, December 1, 2008, 6:26 PM
> >> It was by 'Datanode DiskOutOfSpaceException'. But, I
> >> think daemons should not dead.
> >>
> >> On Wed, Nov 26, 2008 at 1:08 PM, Edward J. Yoon
> >> <edwardyoon@apache.org> wrote:
> >> > Hmm. It often occurs to me. I'll check the logs.
> >> >
> >> > On Fri, Nov 21, 2008 at 9:46 AM, Andrew Purtell
> >> <apurtell@yahoo.com> wrote:
> >> > > I think a 2 node cluster is simply too small for
> >> > > the full load of everything.
> >> > >
> >
> >
> >
> >
> >
>=20
>=20
>=20
> --
> Best Regards, Edward J. Yoon @ NHN, corp.
> edwardyoon@apache.org
> http://blog.udanax.org