Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hbase-dev@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of sriramsrao@gmail.com
 designates 74.125.44.30 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :cc:content-type:content-transfer-encoding;
        b=WZMF452nRSLUSDN9qBNhuPvMLKcbKyWrLltU7z4dW0+k3NtzWmkwEj7WiXF7ocad2n
         U1T1DvAfwsynDebL+qc4MFGj6bkCch/iAw0F4fmL9WuGbWgOMAiPGAOn/qd/CWoc5Mww
         JV1NzrkAingG1zrisiVpvHvnPkV6E/ap/MOUk=
MIME-Version: 1.0
In-Reply-To: <c9690f720904231232m378974den20a7d80752400bc7@mail.gmail.com>
References: <c9690f720904161110u45cdf7d9q5f359462df50ad86@mail.gmail.com>
	 <833586.51976.qm@web65515.mail.ac4.yahoo.com>
	 <c9690f720904171022n24cabb29neca24120977a7526@mail.gmail.com>
	 <c811d0dc0904231217s25d3c1f4ld22ba22e66ff7e71@mail.gmail.com>
	 <c9690f720904231232m378974den20a7d80752400bc7@mail.gmail.com>
Date: Thu, 23 Apr 2009 12:44:56 -0700
Message-ID: <c9690f720904231244m6e8789cbibff42e81a3414cf7@mail.gmail.com>
Subject: Re: [Kosmosfs-users] too many disk IOs / chunkserver suddenly
	consumes ~100GB of address space
From: Sriram Rao <sriramsrao@gmail.com>
To: Joshua Taylor <joshua.taylor@gmail.com>
Cc: apurtell@apache.org, hbase-dev@hadoop.apache.org,
	kosmosfs-users@lists.sourceforge.net
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

I have checked in a fix to the chunkserver to fail the read when it
sees bogus values (revision 316).  That was a bug in the code :-(
Would still be good to know what caused the client to send bogus
values for the size...

Sriram

On Thu, Apr 23, 2009 at 12:32 PM, Sriram Rao <sriramsrao@gmail.com> wrote:
> Josh,
>
> Thanks. =A0Josh Adams had also seen the same bloat and from his logs, I
> had figured that the chunkserver was in loop trying to read data and
> we couldn't quite figure out what was causing it. =A0This looks like the
> smoking gun :-)
>
> What was the test doing? =A0I am still curious why the client requested
> a bizarre amount of data. =A0I'll put in the fixes to fail the read.
>
> Sriram
>
> On Thu, Apr 23, 2009 at 12:17 PM, Joshua Taylor <joshua.taylor@gmail.com>=
 wrote:
>> I have a bit more info about this problem.=A0 It just happened to one of=
 my
>> chunkservers as well.=A0 I had replaced all log4cpp calls with fprintf t=
o
>> avoid random log4cpp crashes (separate problem), so the chunkserver is
>> already doing debug logging.=A0 Here's the log from the beginning of the
>> bloat/overload:
>>
>> 04-23-2009 05:42:26.625 DEBUG - (ClientSM.cc:301) Got command: read: chu=
nkId
>> =3D 51823 chunkversion =3D 1 offset: 32636928 numBytes: 131072
>> 04-23-2009 05:42:26.630 DEBUG - (ClientSM.cc:301) Got command: read: chu=
nkId
>> =3D 51823 chunkversion =3D 1 offset: 32768000 numBytes: 131072
>> 04-23-2009 05:42:26.633 DEBUG - (ClientSM.cc:301) Got command: read: chu=
nkId
>> =3D 51823 chunkversion =3D 1 offset: 32899072 numBytes: 131072
>> 04-23-2009 05:42:26.636 DEBUG - (ClientSM.cc:301) Got command: read: chu=
nkId
>> =3D 51823 chunkversion =3D 1 offset: 33030144 numBytes: 131072
>> 04-23-2009 05:42:26.638 DEBUG - (ClientSM.cc:301) Got command: read: chu=
nkId
>> =3D 51823 chunkversion =3D 1 offset: 33161216 numBytes: 131072
>> 04-23-2009 05:42:26.641 DEBUG - (ClientSM.cc:301) Got command: read: chu=
nkId
>> =3D 51823 chunkversion =3D 1 offset: 33292288 numBytes: 131072
>> 04-23-2009 05:42:26.648 DEBUG - (ClientSM.cc:301) Got command: read: chu=
nkId
>> =3D 49677 chunkversion =3D 1 offset: 16276417 numBytes: 9223372036854775=
807
>> 04-23-2009 05:42:27.053 INFO - (DiskManager.cc:392) Too many disk IOs (5=
001)
>> outstanding...overloaded
>> 04-23-2009 05:42:27.053 INFO - (DiskManager.cc:392) Too many disk IOs (5=
002)
>> outstanding...overloaded
>> 04-23-2009 05:42:27.053 INFO - (DiskManager.cc:392) Too many disk IOs (5=
003)
>> outstanding...overloaded
>> 04-23-2009 05:42:27.053 INFO - (DiskManager.cc:392) Too many disk IOs (5=
004)
>> outstanding...overloaded
>> 04-23-2009 05:42:27.054 INFO - (DiskManager.cc:392) Too many disk IOs (5=
005)
>> outstanding...overloaded
>> 04-23-2009 05:42:27.054 INFO - (DiskManager.cc:392) Too many disk IOs (5=
006)
>> outstanding...overloaded
>> 04-23-2009 05:42:27.054 INFO - (DiskManager.cc:392) Too many disk IOs (5=
007)
>> outstanding...overloaded
>> 04-23-2009 05:42:27.054 INFO - (DiskManager.cc:392) Too many disk IOs (5=
008)
>> outstanding...overloaded
>> 04-23-2009 05:42:27.055 INFO - (DiskManager.cc:392) Too many disk IOs (5=
009)
>> outstanding...overloaded
>> 04-23-2009 05:42:27.055 INFO - (DiskManager.cc:392) Too many disk IOs (5=
010)
>> outstanding...overloaded
>> 04-23-2009 05:42:27.055 INFO - (DiskManager.cc:392) Too many disk IOs (5=
011)
>> outstanding...overloaded
>> 04-23-2009 05:42:27.055 INFO - (DiskManager.cc:392) Too many disk IOs (5=
012)
>> outstanding...overloaded
>> 04-23-2009 05:42:27.056 INFO - (DiskManager.cc:392) Too many disk IOs (5=
013)
>> outstanding...overloaded
>> 04-23-2009 05:42:27.056 INFO - (DiskManager.cc:392) Too many disk IOs (5=
014)
>> outstanding...overloaded
>> 04-23-2009 05:42:27.056 INFO - (DiskManager.cc:392) Too many disk IOs (5=
015)
>> outstanding...overloaded
>> 04-23-2009 05:42:27.056 INFO - (DiskManager.cc:392) Too many disk IOs (5=
016)
>> outstanding...overloaded
>>
>> The line where numBytes =3D=3D 2^63 - 1 is rather suspicious.=A0 I'm not=
 running
>> Heritrix/Hbase/etc.=A0 This is just a few file copies.=A0 I have 8 chunk=
servers
>> total.=A0 Only one is having this problem.=A0 The other chunkservers are=
 still
>> logging that they're making progress (reading and writing chunks).
>>
>> Here's a stack trace from the server currently pegged at 100% CPU, 200 G=
B
>> VmSize:
>>
>> Thread 5 (Thread 0x40bd9950 (LWP 3838)):
>> #0=A0 0x00000030df80af19 in pthread_cond_wait@@GLIBC_2.3.2 ()
>> #1=A0 0x000000000047f2d7 in KFS::MetaThread::sleep ()
>> #2=A0 0x000000000047f317 in KFS::MetaQueue<KFS::KfsOp>::dequeue_internal=
 ()
>> #3=A0 0x000000000047f414 in KFS::MetaQueue<KFS::KfsOp>::dequeue ()
>> #4=A0 0x000000000047bdf3 in KFS::Logger::MainLoop ()
>> #5=A0 0x000000000047bf79 in logger_main ()
>> #6=A0 0x00000030df80729a in start_thread () from /lib64/libpthread.so.0
>> #7=A0 0x00000030dece439d in clone () from /lib64/libc.so.6
>> Thread 4 (Thread 0x415da950 (LWP 3839)):
>> #0=A0 0x000000000046fe1a in
>> std::_List_const_iterator<boost::shared_ptr<KFS::DiskEvent_t> >::operato=
r++
>> ()
>> #1=A0 0x000000000046fe4a in
>> std::__distance<std::_List_const_iterator<boost::shared_ptr<KFS::DiskEve=
nt_t>
>>> > ()
>> #2=A0 0x000000000046fe93 in
>> std::distance<std::_List_const_iterator<boost::shared_ptr<KFS::DiskEvent=
_t>
>>> > ()
>> #3=A0 0x000000000046fec3 in std::list<boost::shared_ptr<KFS::DiskEvent_t=
>,
>> std::allocator<boost::shared_ptr<KFS::DiskEvent_t> > >::size ()
>> #4=A0 0x000000000048db0a in KFS::DiskManager::IOInitiated ()
>> #5=A0 0x000000000048e0ee in KFS::DiskManager::Read ()
>> #6=A0 0x000000000048aeca in KFS::DiskConnection::Read ()
>> #7=A0 0x0000000000452a74 in KFS::ChunkManager::ReadChunk ()
>> #8=A0 0x0000000000469d04 in KFS::ReadOp::HandleChunkMetaReadDone ()
>> #9=A0 0x0000000000458378 in KFS::ObjectMethod<KFS::ReadOp>::execute ()
>> #10 0x000000000045574e in KFS::KfsCallbackObj::HandleEvent ()
>> #11 0x0000000000452b9f in KFS::ChunkManager::ReadChunkMetadata ()
>> #12 0x0000000000467fe1 in KFS::ReadOp::Execute ()
>> #13 0x0000000000463ad3 in KFS::SubmitOp ()
>> #14 0x000000000046104e in KFS::ClientSM::HandleClientCmd ()
>> #15 0x000000000046163c in KFS::ClientSM::HandleRequest ()
>> #16 0x00000000004628e8 in KFS::ObjectMethod<KFS::ClientSM>::execute ()
>> #17 0x000000000045574e in KFS::KfsCallbackObj::HandleEvent ()
>> #18 0x0000000000493a20 in KFS::NetConnection::HandleReadEvent ()
>> #19 0x00000000004949ca in KFS::NetManager::MainLoop ()
>> #20 0x000000000045f2c0 in netWorker ()
>> #21 0x00000030df80729a in start_thread () from /lib64/libpthread.so.0
>> #22 0x00000030dece439d in clone () from /lib64/libc.so.6
>> Thread 3 (Thread 0x415e2950 (LWP 9108)):
>> #0=A0 0x00000030df80b19d in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>> #1=A0 0x00000030e0003183 in handle_fildes_io () from /lib64/librt.so.1
>> #2=A0 0x00000030df80729a in start_thread () from /lib64/libpthread.so.0
>> #3=A0 0x00000030dece439d in clone () from /lib64/libc.so.6
>> Thread 2 (Thread 0x415ea950 (LWP 9508)):
>> #0=A0 0x00000030df80b19d in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>> #1=A0 0x00000030e0003183 in handle_fildes_io () from /lib64/librt.so.1
>> #2=A0 0x00000030df80729a in start_thread () from /lib64/libpthread.so.0
>> #3=A0 0x00000030dece439d in clone () from /lib64/libc.so.6
>> Thread 1 (Thread 0x7fd83a463700 (LWP 3837)):
>> #0=A0 0x00000030df807b75 in pthread_join () from /lib64/libpthread.so.0
>> #1=A0 0x000000000045f87e in KFS::MetaThread::join ()
>> #2=A0 0x000000000045f299 in KFS::ChunkServer::MainLoop ()
>> #3=A0 0x000000000044e0b8 in main ()
>>
>> Everything looks blocked except Thread 4, so I'm guessing it's the one g=
oing
>> berzerk.
>>
>>
>> On Fri, Apr 17, 2009 at 10:22 AM, Sriram Rao <sriramsrao@gmail.com> wrot=
e:
>>>
>>> Hi Andy,
>>>
>>> Thanks for the info. =A0From what you are seeing, it looks like
>>> something is causing a memory bloat, causing too many IOs and
>>> eventually everything grinds to a halt. =A0Since it is easy to
>>> reproduce, I can give it a try on my machine. =A0Can you give me the
>>> inputs to Heritrix and any config info to Hbase?
>>>
>>> It'd also be good if you can get me the chunkserver logs. =A0In the
>>> Chunkserver.prp file, can you add the following line:
>>> chunkServer.loglevel =3D DEBUG
>>>
>>> and send me the file/upload it? =A0The easier thing to do is, file a bu=
g
>>> about this issue on KFS sourceforge page and then upload the
>>> chunkserver logs.
>>>
>>> Def. want to get to the bottom of this since this is blocking you/Ryan.
>>>
>>> Sriram
>>>
>>> On Fri, Apr 17, 2009 at 4:16 AM, Andrew Purtell <apurtell@apache.org>
>>> wrote:
>>> >
>>> > Hi Sriram,
>>> >
>>> >> Can you tell me the exact steps to repro the problem:
>>> >> =A0- What version of Hbase?
>>> >
>>> > SVN trunk, 0.20.0-dev
>>> >
>>> >> =A0- Which version of Heritrix?
>>> >
>>> > Heritrix 2.0, plus the HBase writer which can be found here:
>>> > http://code.google.com/p/hbase-writer/
>>> >
>>> >> What is happening is that the KFS chunkserver is sending
>>> >> writes down to disk and they aren't coming back "soon
>>> >> enough", causing things to backlog; the chunkserver is
>>> >> printing out the backlog status message.
>>> >
>>> > I wonder if this might be a secondary effect. Just before
>>> > these messages begin streaming into the log, the chunkserver
>>> > suddenly balloons its address space from ~200KB to ~100GB.
>>> > These two things have strong correlation and happen in the
>>> > same order in repeatable manner.
>>> >
>>> > Once the backlog messages begin, no further IO completes as
>>> > far as I can tell. The count of outstanding IOs
>>> > monotonically increases. Also, the metaserver declares the
>>> > chunkserver dead.
>>> >
>>> > I can take steps to help diagnose the problem. Please advise.
>>> > Would it help if I replicate the problem again with
>>> > chunkserver logging at DEBUG and then post the compressed
>>> > logs somewhere?
>>> >
>>> > [...]
>>> >> On Thu, Apr 16, 2009 at 12:27 AM, Andrew Purtell
>>> >> >
>>> >> > Hi,
>>> >> >
>>> >> > Like Ryan I have been trying to run HBase on top of
>>> >> > KFS. In my case I am running a SVN snapshot from
>>> >> > yesterday. I have a minimal installation of KFS
>>> >> > metaserver, chunkserver, and HBase master and
>>> >> > regionserver all running on one test host with 4GB of
>>> >> > RAM. Of course I do not expect more than minimal
>>> >> > function. To apply some light load, I run the Heritrix
>>> >> > crawler with 5 TOE threads which write on average
>>> >> > 200 Kbit/sec of data into HBase, which flushes this
>>> >> > incoming data in ~64MB increments and also runs
>>> >> > occasional compaction cycles where the 64MB flush
>>> >> > files will be compacted into ~256MB files.
>>> >> >
>>> >> > I find that for no obvious reason the chunkserver will
>>> >> > suddenly grab ~100 GIGAbytes of address space and emit
>>> >> > a steady stream of "(DiskManager.cc:392) Too many disk
>>> >> > IOs (N)" to the log at INFO level, where N is a
>>> >> > steadily increasing number. The host is under moderate
>>> >> > load at the time -- KFS is busy -- but is not in swap
>>> >> > and according to atop has some disk I/O and network
>>> >> > bandwidth to spare.
>>> > [...]
>>> >
>>> >
>>> >
>>> >
>>> >
>>>
>>>
>>> -----------------------------------------------------------------------=
-------
>>> Stay on top of everything new and different, both inside and
>>> around Java (TM) technology - register by April 22, and save
>>> $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
>>> 300 plus technical and hands-on sessions. Register today.
>>> Use priority code J9JMT32. http://p.sf.net/sfu/p
>>> _______________________________________________
>>> Kosmosfs-users mailing list
>>> Kosmosfs-users@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/kosmosfs-users
>>
>>
>