hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: [Kosmosfs-users] too many disk IOs / chunkserver suddenly consumes ~100GB of address space
Date Fri, 17 Apr 2009 11:16:53 GMT

Hi Sriram,

> Can you tell me the exact steps to repro the problem:
>  - What version of Hbase?

SVN trunk, 0.20.0-dev

>  - Which version of Heritrix?

Heritrix 2.0, plus the HBase writer which can be found here:
http://code.google.com/p/hbase-writer/
 
> What is happening is that the KFS chunkserver is sending
> writes down to disk and they aren't coming back "soon
> enough", causing things to backlog; the chunkserver is
> printing out the backlog status message.

I wonder if this might be a secondary effect. Just before
these messages begin streaming into the log, the chunkserver 
suddenly balloons its address space from ~200KB to ~100GB.
These two things have strong correlation and happen in the
same order in repeatable manner.

Once the backlog messages begin, no further IO completes as
far as I can tell. The count of outstanding IOs 
monotonically increases. Also, the metaserver declares the
chunkserver dead.

I can take steps to help diagnose the problem. Please advise.
Would it help if I replicate the problem again with 
chunkserver logging at DEBUG and then post the compressed
logs somewhere?

[...]
> On Thu, Apr 16, 2009 at 12:27 AM, Andrew Purtell
> >
> > Hi,
> >
> > Like Ryan I have been trying to run HBase on top of
> > KFS. In my case I am running a SVN snapshot from
> > yesterday. I have a minimal installation of KFS
> > metaserver, chunkserver, and HBase master and
> > regionserver all running on one test host with 4GB of
> > RAM. Of course I do not expect more than minimal
> > function. To apply some light load, I run the Heritrix
> > crawler with 5 TOE threads which write on average
> > 200 Kbit/sec of data into HBase, which flushes this
> > incoming data in ~64MB increments and also runs
> > occasional compaction cycles where the 64MB flush
> > files will be compacted into ~256MB files.
> >
> > I find that for no obvious reason the chunkserver will
> > suddenly grab ~100 GIGAbytes of address space and emit
> > a steady stream of "(DiskManager.cc:392) Too many disk
> > IOs (N)" to the log at INFO level, where N is a
> > steadily increasing number. The host is under moderate
> > load at the time -- KFS is busy -- but is not in swap
> > and according to atop has some disk I/O and network
> > bandwidth to spare.
[...]



      

Mime
View raw message