Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hbase-dev@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of sriramsrao@gmail.com
 designates 74.125.44.30 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :cc:content-type:content-transfer-encoding;
        b=IRdpr4mZyB5NLF69c926WlSOv/J674bqr7LBdB28STWUIwTQZEOQMATNa7bBWdQ3WG
         LlN3Ij2WU8R/IFCCyL/BdtglB3VAJUjikmYP7vFrOfIyjYLwGDhS66g3rpg8ql46UhpD
         IekvSzEdseWACJRUYH30z9Upwbd8SbXLamXNw=
MIME-Version: 1.0
In-Reply-To: <c9690f720904231244m6e8789cbibff42e81a3414cf7@mail.gmail.com>
References: <c9690f720904161110u45cdf7d9q5f359462df50ad86@mail.gmail.com>
	 <833586.51976.qm@web65515.mail.ac4.yahoo.com>
	 <c9690f720904171022n24cabb29neca24120977a7526@mail.gmail.com>
	 <c811d0dc0904231217s25d3c1f4ld22ba22e66ff7e71@mail.gmail.com>
	 <c9690f720904231232m378974den20a7d80752400bc7@mail.gmail.com>
	 <c9690f720904231244m6e8789cbibff42e81a3414cf7@mail.gmail.com>
Date: Sat, 25 Apr 2009 13:56:33 -0700
Message-ID: <c9690f720904251356k429deafq97231d95ae7b9485@mail.gmail.com>
Subject: Re: [Kosmosfs-users] too many disk IOs / chunkserver suddenly
	consumes ~100GB of address space
From: Sriram Rao <sriramsrao@gmail.com>
To: Joshua Taylor <joshua.taylor@gmail.com>
Cc: apurtell@apache.org, hbase-dev@hadoop.apache.org,
	kosmosfs-users@lists.sourceforge.net
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Thanks to Josh's logs we figured out what caused the problem.  There
was a bug in the readahead code that was causing the client to send
bogus values for read; we've fixed the client side and also fixed the
server side to handle this case.  This fixes are in with revision 327
of trunk.

Sriram

On Thu, Apr 23, 2009 at 12:44 PM, Sriram Rao <sriramsrao@gmail.com> wrote:
> I have checked in a fix to the chunkserver to fail the read when it
> sees bogus values (revision 316). =A0That was a bug in the code :-(
> Would still be good to know what caused the client to send bogus
> values for the size...
>
> Sriram
>
> On Thu, Apr 23, 2009 at 12:32 PM, Sriram Rao <sriramsrao@gmail.com> wrote=
:
>> Josh,
>>
>> Thanks. =A0Josh Adams had also seen the same bloat and from his logs, I
>> had figured that the chunkserver was in loop trying to read data and
>> we couldn't quite figure out what was causing it. =A0This looks like the
>> smoking gun :-)
>>
>> What was the test doing? =A0I am still curious why the client requested
>> a bizarre amount of data. =A0I'll put in the fixes to fail the read.
>>
>> Sriram
>>
>> On Thu, Apr 23, 2009 at 12:17 PM, Joshua Taylor <joshua.taylor@gmail.com=
> wrote:
>>> I have a bit more info about this problem.=A0 It just happened to one o=
f my
>>> chunkservers as well.=A0 I had replaced all log4cpp calls with fprintf =
to
>>> avoid random log4cpp crashes (separate problem), so the chunkserver is
>>> already doing debug logging.=A0 Here's the log from the beginning of th=
e
>>> bloat/overload:
>>>
>>> 04-23-2009 05:42:26.625 DEBUG - (ClientSM.cc:301) Got command: read: ch=
unkId
>>> =3D 51823 chunkversion =3D 1 offset: 32636928 numBytes: 131072
>>> 04-23-2009 05:42:26.630 DEBUG - (ClientSM.cc:301) Got command: read: ch=
unkId
>>> =3D 51823 chunkversion =3D 1 offset: 32768000 numBytes: 131072
>>> 04-23-2009 05:42:26.633 DEBUG - (ClientSM.cc:301) Got command: read: ch=
unkId
>>> =3D 51823 chunkversion =3D 1 offset: 32899072 numBytes: 131072
>>> 04-23-2009 05:42:26.636 DEBUG - (ClientSM.cc:301) Got command: read: ch=
unkId
>>> =3D 51823 chunkversion =3D 1 offset: 33030144 numBytes: 131072
>>> 04-23-2009 05:42:26.638 DEBUG - (ClientSM.cc:301) Got command: read: ch=
unkId
>>> =3D 51823 chunkversion =3D 1 offset: 33161216 numBytes: 131072
>>> 04-23-2009 05:42:26.641 DEBUG - (ClientSM.cc:301) Got command: read: ch=
unkId
>>> =3D 51823 chunkversion =3D 1 offset: 33292288 numBytes: 131072
>>> 04-23-2009 05:42:26.648 DEBUG - (ClientSM.cc:301) Got command: read: ch=
unkId
>>> =3D 49677 chunkversion =3D 1 offset: 16276417 numBytes: 922337203685477=
5807
>>> 04-23-2009 05:42:27.053 INFO - (DiskManager.cc:392) Too many disk IOs (=
5001)
>>> outstanding...overloaded
>>> 04-23-2009 05:42:27.053 INFO - (DiskManager.cc:392) Too many disk IOs (=
5002)
>>> outstanding...overloaded
>>> 04-23-2009 05:42:27.053 INFO - (DiskManager.cc:392) Too many disk IOs (=
5003)
>>> outstanding...overloaded
>>> 04-23-2009 05:42:27.053 INFO - (DiskManager.cc:392) Too many disk IOs (=
5004)
>>> outstanding...overloaded
>>> 04-23-2009 05:42:27.054 INFO - (DiskManager.cc:392) Too many disk IOs (=
5005)
>>> outstanding...overloaded
>>> 04-23-2009 05:42:27.054 INFO - (DiskManager.cc:392) Too many disk IOs (=
5006)
>>> outstanding...overloaded
>>> 04-23-2009 05:42:27.054 INFO - (DiskManager.cc:392) Too many disk IOs (=
5007)
>>> outstanding...overloaded
>>> 04-23-2009 05:42:27.054 INFO - (DiskManager.cc:392) Too many disk IOs (=
5008)
>>> outstanding...overloaded
>>> 04-23-2009 05:42:27.055 INFO - (DiskManager.cc:392) Too many disk IOs (=
5009)
>>> outstanding...overloaded
>>> 04-23-2009 05:42:27.055 INFO - (DiskManager.cc:392) Too many disk IOs (=
5010)
>>> outstanding...overloaded
>>> 04-23-2009 05:42:27.055 INFO - (DiskManager.cc:392) Too many disk IOs (=
5011)
>>> outstanding...overloaded
>>> 04-23-2009 05:42:27.055 INFO - (DiskManager.cc:392) Too many disk IOs (=
5012)
>>> outstanding...overloaded
>>> 04-23-2009 05:42:27.056 INFO - (DiskManager.cc:392) Too many disk IOs (=
5013)
>>> outstanding...overloaded
>>> 04-23-2009 05:42:27.056 INFO - (DiskManager.cc:392) Too many disk IOs (=
5014)
>>> outstanding...overloaded
>>> 04-23-2009 05:42:27.056 INFO - (DiskManager.cc:392) Too many disk IOs (=
5015)
>>> outstanding...overloaded
>>> 04-23-2009 05:42:27.056 INFO - (DiskManager.cc:392) Too many disk IOs (=
5016)
>>> outstanding...overloaded
>>>
>>> The line where numBytes =3D=3D 2^63 - 1 is rather suspicious.=A0 I'm no=
t running
>>> Heritrix/Hbase/etc.=A0 This is just a few file copies.=A0 I have 8 chun=
kservers
>>> total.=A0 Only one is having this problem.=A0 The other chunkservers ar=
e still
>>> logging that they're making progress (reading and writing chunks).
>>>
>>> Here's a stack trace from the server currently pegged at 100% CPU, 200 =
GB
>>> VmSize:
>>>
>>> Thread 5 (Thread 0x40bd9950 (LWP 3838)):
>>> #0=A0 0x00000030df80af19 in pthread_cond_wait@@GLIBC_2.3.2 ()
>>> #1=A0 0x000000000047f2d7 in KFS::MetaThread::sleep ()
>>> #2=A0 0x000000000047f317 in KFS::MetaQueue<KFS::KfsOp>::dequeue_interna=
l ()
>>> #3=A0 0x000000000047f414 in KFS::MetaQueue<KFS::KfsOp>::dequeue ()
>>> #4=A0 0x000000000047bdf3 in KFS::Logger::MainLoop ()
>>> #5=A0 0x000000000047bf79 in logger_main ()
>>> #6=A0 0x00000030df80729a in start_thread () from /lib64/libpthread.so.0
>>> #7=A0 0x00000030dece439d in clone () from /lib64/libc.so.6
>>> Thread 4 (Thread 0x415da950 (LWP 3839)):
>>> #0=A0 0x000000000046fe1a in
>>> std::_List_const_iterator<boost::shared_ptr<KFS::DiskEvent_t> >::operat=
or++
>>> ()
>>> #1=A0 0x000000000046fe4a in
>>> std::__distance<std::_List_const_iterator<boost::shared_ptr<KFS::DiskEv=
ent_t>
>>>> > ()
>>> #2=A0 0x000000000046fe93 in
>>> std::distance<std::_List_const_iterator<boost::shared_ptr<KFS::DiskEven=
t_t>
>>>> > ()
>>> #3=A0 0x000000000046fec3 in std::list<boost::shared_ptr<KFS::DiskEvent_=
t>,
>>> std::allocator<boost::shared_ptr<KFS::DiskEvent_t> > >::size ()
>>> #4=A0 0x000000000048db0a in KFS::DiskManager::IOInitiated ()
>>> #5=A0 0x000000000048e0ee in KFS::DiskManager::Read ()
>>> #6=A0 0x000000000048aeca in KFS::DiskConnection::Read ()
>>> #7=A0 0x0000000000452a74 in KFS::ChunkManager::ReadChunk ()
>>> #8=A0 0x0000000000469d04 in KFS::ReadOp::HandleChunkMetaReadDone ()
>>> #9=A0 0x0000000000458378 in KFS::ObjectMethod<KFS::ReadOp>::execute ()
>>> #10 0x000000000045574e in KFS::KfsCallbackObj::HandleEvent ()
>>> #11 0x0000000000452b9f in KFS::ChunkManager::ReadChunkMetadata ()
>>> #12 0x0000000000467fe1 in KFS::ReadOp::Execute ()
>>> #13 0x0000000000463ad3 in KFS::SubmitOp ()
>>> #14 0x000000000046104e in KFS::ClientSM::HandleClientCmd ()
>>> #15 0x000000000046163c in KFS::ClientSM::HandleRequest ()
>>> #16 0x00000000004628e8 in KFS::ObjectMethod<KFS::ClientSM>::execute ()
>>> #17 0x000000000045574e in KFS::KfsCallbackObj::HandleEvent ()
>>> #18 0x0000000000493a20 in KFS::NetConnection::HandleReadEvent ()
>>> #19 0x00000000004949ca in KFS::NetManager::MainLoop ()
>>> #20 0x000000000045f2c0 in netWorker ()
>>> #21 0x00000030df80729a in start_thread () from /lib64/libpthread.so.0
>>> #22 0x00000030dece439d in clone () from /lib64/libc.so.6
>>> Thread 3 (Thread 0x415e2950 (LWP 9108)):
>>> #0=A0 0x00000030df80b19d in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>> #1=A0 0x00000030e0003183 in handle_fildes_io () from /lib64/librt.so.1
>>> #2=A0 0x00000030df80729a in start_thread () from /lib64/libpthread.so.0
>>> #3=A0 0x00000030dece439d in clone () from /lib64/libc.so.6
>>> Thread 2 (Thread 0x415ea950 (LWP 9508)):
>>> #0=A0 0x00000030df80b19d in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>> #1=A0 0x00000030e0003183 in handle_fildes_io () from /lib64/librt.so.1
>>> #2=A0 0x00000030df80729a in start_thread () from /lib64/libpthread.so.0
>>> #3=A0 0x00000030dece439d in clone () from /lib64/libc.so.6
>>> Thread 1 (Thread 0x7fd83a463700 (LWP 3837)):
>>> #0=A0 0x00000030df807b75 in pthread_join () from /lib64/libpthread.so.0
>>> #1=A0 0x000000000045f87e in KFS::MetaThread::join ()
>>> #2=A0 0x000000000045f299 in KFS::ChunkServer::MainLoop ()
>>> #3=A0 0x000000000044e0b8 in main ()
>>>
>>> Everything looks blocked except Thread 4, so I'm guessing it's the one =
going
>>> berzerk.
>>>
>>>
>>> On Fri, Apr 17, 2009 at 10:22 AM, Sriram Rao <sriramsrao@gmail.com> wro=
te:
>>>>
>>>> Hi Andy,
>>>>
>>>> Thanks for the info. =A0From what you are seeing, it looks like
>>>> something is causing a memory bloat, causing too many IOs and
>>>> eventually everything grinds to a halt. =A0Since it is easy to
>>>> reproduce, I can give it a try on my machine. =A0Can you give me the
>>>> inputs to Heritrix and any config info to Hbase?
>>>>
>>>> It'd also be good if you can get me the chunkserver logs. =A0In the
>>>> Chunkserver.prp file, can you add the following line:
>>>> chunkServer.loglevel =3D DEBUG
>>>>
>>>> and send me the file/upload it? =A0The easier thing to do is, file a b=
ug
>>>> about this issue on KFS sourceforge page and then upload the
>>>> chunkserver logs.
>>>>
>>>> Def. want to get to the bottom of this since this is blocking you/Ryan=
.
>>>>
>>>> Sriram
>>>>
>>>> On Fri, Apr 17, 2009 at 4:16 AM, Andrew Purtell <apurtell@apache.org>
>>>> wrote:
>>>> >
>>>> > Hi Sriram,
>>>> >
>>>> >> Can you tell me the exact steps to repro the problem:
>>>> >> =A0- What version of Hbase?
>>>> >
>>>> > SVN trunk, 0.20.0-dev
>>>> >
>>>> >> =A0- Which version of Heritrix?
>>>> >
>>>> > Heritrix 2.0, plus the HBase writer which can be found here:
>>>> > http://code.google.com/p/hbase-writer/
>>>> >
>>>> >> What is happening is that the KFS chunkserver is sending
>>>> >> writes down to disk and they aren't coming back "soon
>>>> >> enough", causing things to backlog; the chunkserver is
>>>> >> printing out the backlog status message.
>>>> >
>>>> > I wonder if this might be a secondary effect. Just before
>>>> > these messages begin streaming into the log, the chunkserver
>>>> > suddenly balloons its address space from ~200KB to ~100GB.
>>>> > These two things have strong correlation and happen in the
>>>> > same order in repeatable manner.
>>>> >
>>>> > Once the backlog messages begin, no further IO completes as
>>>> > far as I can tell. The count of outstanding IOs
>>>> > monotonically increases. Also, the metaserver declares the
>>>> > chunkserver dead.
>>>> >
>>>> > I can take steps to help diagnose the problem. Please advise.
>>>> > Would it help if I replicate the problem again with
>>>> > chunkserver logging at DEBUG and then post the compressed
>>>> > logs somewhere?
>>>> >
>>>> > [...]
>>>> >> On Thu, Apr 16, 2009 at 12:27 AM, Andrew Purtell
>>>> >> >
>>>> >> > Hi,
>>>> >> >
>>>> >> > Like Ryan I have been trying to run HBase on top of
>>>> >> > KFS. In my case I am running a SVN snapshot from
>>>> >> > yesterday. I have a minimal installation of KFS
>>>> >> > metaserver, chunkserver, and HBase master and
>>>> >> > regionserver all running on one test host with 4GB of
>>>> >> > RAM. Of course I do not expect more than minimal
>>>> >> > function. To apply some light load, I run the Heritrix
>>>> >> > crawler with 5 TOE threads which write on average
>>>> >> > 200 Kbit/sec of data into HBase, which flushes this
>>>> >> > incoming data in ~64MB increments and also runs
>>>> >> > occasional compaction cycles where the 64MB flush
>>>> >> > files will be compacted into ~256MB files.
>>>> >> >
>>>> >> > I find that for no obvious reason the chunkserver will
>>>> >> > suddenly grab ~100 GIGAbytes of address space and emit
>>>> >> > a steady stream of "(DiskManager.cc:392) Too many disk
>>>> >> > IOs (N)" to the log at INFO level, where N is a
>>>> >> > steadily increasing number. The host is under moderate
>>>> >> > load at the time -- KFS is busy -- but is not in swap
>>>> >> > and according to atop has some disk I/O and network
>>>> >> > bandwidth to spare.
>>>> > [...]
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>>
>>>>
>>>> ----------------------------------------------------------------------=
--------
>>>> Stay on top of everything new and different, both inside and
>>>> around Java (TM) technology - register by April 22, and save
>>>> $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
>>>> 300 plus technical and hands-on sessions. Register today.
>>>> Use priority code J9JMT32. http://p.sf.net/sfu/p
>>>> _______________________________________________
>>>> Kosmosfs-users mailing list
>>>> Kosmosfs-users@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/kosmosfs-users
>>>
>>>
>>
>