Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (nike.apache.org: domain of kevin.odell@cloudera.com
 designates 209.85.215.54 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <1360053550728-4037949.post@n3.nabble.com>
References: <1359981948821-4037887.post@n3.nabble.com>
	<CAGngS9cv3z4qAkqR9hxGW03zf52KNEf463YBWOpq6GbPiAmVQw@mail.gmail.com>
	<1360053550728-4037949.post@n3.nabble.com>
Date: Tue, 5 Feb 2013 09:20:42 -0500
Message-ID: 
 <CAGngS9c=JTvnYmSjE0U5i3t8EzjUj_EzWKOGgnosMZLqFVQhQQ@mail.gmail.com>
Subject: Re: MemStoreFlusher: region has too many store files (client timeout)
From: "Kevin O'dell" <kevin.odell@cloudera.com>
To: user@hbase.apache.org
Content-Type: multipart/alternative; boundary=e89a8f22c6f530692004d4faeaff

--e89a8f22c6f530692004d4faeaff
Content-Type: text/plain; charset=ISO-8859-1

kzurek,

  Okay, that is a fair flush size(I have not seen larger than 256MB).  Do
you think you could send me a RS log from yesterday while you were seeing
this issue?  I would just like to thumb through it and make some tuning
recommendations.  I have a pretty good idea of what you need to do, if you
look at:

http://gbif.blogspot.com/2012/07/optimizing-writes-in-hbase.html The
article has some good ideas for write tuning.

On Tue, Feb 5, 2013 at 3:39 AM, kzurek <kzurek@proximetry.pl> wrote:

> Thanks for the reply, although I should clear some misunderstandings. In
> general, I do know general behavior and difference between minor and major
> compaction, as well as when minor compaction might become (could be
> promoted) to major compaction. I just wanted to verify influence of
> compaction (mostly major) on our cluster performance. Thus, I've created
> test where I'm putting data to only one region (total 71) by one single
> threaded client using build in caching mechanism (according to "HBase The
> Definitive Guide" book) and triggering major compaction by hand
> (HBaseAdmin). Although, after few tests I've noticed that major compaction
> (Large Compaction) is being triggered (cache flusher, recursive queue) so I
> left it as it was (not triggering it anymore). That brought me to this
> situation, where I'm putting data and after a while I'm getting timeouts on
> the client, in meanwhile I see that memstore is being flush which cant
> create new store file (cause there are to many of them) and which is
> frequently blocked by compaction process. I hope that this short
> description
> will bring closer look at the issue. In addition, here are some answers to
> your questions:
>
> 1. How often are you flushing?
>  I'm not triggering flushing by hand, but I've noticed that data is being
> flushed every 4s (275m) or 1m 30s-40s (1.5g).
> 2. How often are you force flushing from HLog rolls?
> Default settings are: blocksize=64 MB, rollsize=60.8 MB, enabled=true,
> optionallogflushinternal=1000ms. It seems that roll is made every hour.
> 3. What size are your flushes?
>  Depends, from 275m up to 1.5g. I've set my memstore flush size to 256m and
> memstore block multiplier to 6. Should I increase the flush size??
> 4. What does your region count look like as that can affect your flush
> size?
> Initial split is 37 regions on 6 RegionServers, but at the moment there are
> 71 regions.
>
>
>
> Kevin O'dell wrote
> > Kzurek,
> >
> > Just because you turn off time based major compactions, it does not mean
> > that you have turned major compaction off.  Compaction can still be
> > promoted to be Majors.  Also, the only real difference between a major
> and
> > minor compaction is one processes deletes.  You should really schedule at
> > least daily major compactions.  As for your blocking issue, there are
> > quite
> > a few things you would want to look at:
> >
> > How often are you flushing?
> >
> > How often are you force flushing from HLog rolls?
> >
> > What size are your flushes?
> >
> > What does your region count look like as that can affect your flush size?
> >
> > etc
> >
> > When I see HBase blocking constantly it is usually a sign that you need
> to
> > do some fine grain tuning.
> >
> > On Mon, Feb 4, 2013 at 7:45 AM, kzurek &lt;
>
> > kzurek@
>
> > &gt; wrote:
> >
> >> I'm facing some issues regarding to major compaction. I've disabled
> major
> >> compaction and it is not triggered manually, but when I'm loading data
> to
> >> selected region, I saw that major compaction queue is growing and it is
> >> being triggered ('Large Compaction' in logs) quite often (mainly due to
> >> cacheFlusher). Moreover, I've noticed that my clients app gets timeout
> >> while
> >> putting data into the cluster (happens when memory store flusher is
> >> trying
> >> to dump memory content to store file, but it cannot due to too many
> store
> >> files), also drop in data rate, which in this case is obvious, is
> >> noticeable.  For me, it looks like compaction process is not fast enough
> >> comparing to incoming rate of data or ... maybe something else?? and
> >> finally
> >> is blocking the update process.
> >>
> >> Basic information:
> >> HBase Version: 0.92.1, r1298924
> >> Hadoop Version: 1.0.3, r1335192
> >>
> >> Logs:
> >> 2013-02-01 15:43:14,627 DEBUG
> >> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Large
> Compaction
> >> requested:
> >>
> >>
> regionName=test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158.,
> >> storeName=data, fileCount=3, fileSize=478.3m (249.8m, 113.7m, 114.7m),
> >> priority=-3, time=1051078047102762; Because:
> >> regionserver60020.cacheFlusher;
> >> compaction_queue=(48:0), split_queue=0
> >> 2013-02-01 15:43:14,627 DEBUG
> org.apache.hadoop.hbase.regionserver.Store:
> >> 3b710693d6314c2a987b07dd82451158 - tag: no store files to compact
> >> 2013-02-01 15:43:14,709 WARN org.apache.hadoop.ipc.HBaseServer:
> >> (responseTooSlow):
> >>
> >>
> {"processingtimems":61908,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@57a56081
> >> ),
> >> rpc version=1, client version=29,
> >> methodsFingerPrint=54742778","client":"192.168.1.68:49419
> >>
> ","starttimems":1359729732799,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"}
> >> 2013-02-01 15:43:14,710 WARN org.apache.hadoop.ipc.HBaseServer: IPC
> >> Server
> >> Responder, call
> multi(org.apache.hadoop.hbase.client.MultiAction@57a56081
> >> ),
> >> rpc version=1, client version=29, methodsFingerPrint=54742778 from
> >> 192.168.1.68:49419: output error
> >> 2013-02-01 15:43:14,710 WARN org.apache.hadoop.ipc.HBaseServer: IPC
> >> Server
> >> handler 3 on 60020 caught: java.nio.channels.ClosedChannelException
> >>         at
> >> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
> >>         at
> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
> >>         at
> >> org.apache.hadoop.hbase.ipc.HBaseServer.channelIO(HBaseServer.java:1710)
> >>         at
> >>
> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1653)
> >>         at
> >>
> >>
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:924)
> >>         at
> >>
> >>
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:1003)
> >>         at
> >>
> >>
> org.apache.hadoop.hbase.ipc.HBaseServer$Call.sendResponseIfReady(HBaseServer.java:409)
> >>         at
> >>
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1346)
> >>
> >> 2013-02-01 15:43:19,397 DEBUG
> >> org.apache.hadoop.hbase.regionserver.HRegion:
> >> Flush requested on
> >>
> >>
> test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158.
> >> 2013-02-01 15:43:19,397 WARN
> >> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Region
> >>
> >>
> test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158.
> >> has too many store files; delaying flush up to 90000ms
> >> 2013-02-01 15:43:19,397 DEBUG
> org.apache.hadoop.hbase.regionserver.Store:
> >> 3b710693d6314c2a987b07dd82451158 - data: no store files to compact
> >> 2013-02-01 15:43:19,397 DEBUG
> org.apache.hadoop.hbase.regionserver.Store:
> >> 3b710693d6314c2a987b07dd82451158 - tag: no store files to compact
> >> 2013-02-01 15:43:55,693 INFO
> >> org.apache.hadoop.hbase.regionserver.HRegion:
> >> Blocking updates for 'IPC Server handler 10 on 60020' on region
> >>
> >>
> test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158.:
> >> memstore size 1.5g is >= than blocking 1.5g size
> >> 2013-02-01 15:44:49,452 INFO
> >> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Waited 90055ms on
> a
> >> compaction to clean up 'too many store files'; waited long enough...
> >> proceeding with flush of
> >>
> >>
> test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158.
> >> 2013-02-01 15:44:49,452 DEBUG
> >> org.apache.hadoop.hbase.regionserver.HRegion:
> >> Started memstore flush for
> >>
> >>
> test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158.,
> >> current region memstore size 1.5g
> >> 2013-02-01 15:44:49,453 DEBUG
> >> org.apache.hadoop.hbase.regionserver.HRegion:
> >> Finished snapshotting
> >>
> >>
> test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158.,
> >> commencing wait for mvcc, flushsize=1624094576
> >> 2013-02-01 15:44:50,014 DEBUG
> >> org.apache.hadoop.hbase.io.hfile.HFileWriterV2: Initialized with
> >> CacheConfig:enabled [cacheDataOnRead=true] [cacheDataOnWrite=false]
> >> [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false]
> >> [cacheEvictOnClose=false] [cacheCompressed=false]
> >> 2013-02-01 15:44:55,082 DEBUG
> org.apache.hadoop.hbase.regionserver.Store:
> >> Renaming flushed file at
> >>
> >>
> hdfs://SOME-PC:9000/hbase/test/3b710693d6314c2a987b07dd82451158/.tmp/7ddee02376664ac387eb3c786c541ed0
> >> to
> >>
> >>
> hdfs://SOME-PC:9000/hbase/test/3b710693d6314c2a987b07dd82451158/data/7ddee02376664ac387eb3c786c541ed0
> >> 2013-02-01 15:44:55,086 INFO org.apache.hadoop.hbase.regionserver.Store:
> >> Added
> >>
> >>
> hdfs://SOME-PC:9000/hbase/test/3b710693d6314c2a987b07dd82451158/data/7ddee02376664ac387eb3c786c541ed0,
> >> entries=7157100, sequenceid=58642, memsize=1.5g, filesize=113.8m
> >> 2013-02-01 15:44:55,087 INFO
> >> org.apache.hadoop.hbase.regionserver.HRegion:
> >> Unblocking updates for region
> >>
> >>
> test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158.
> >> 'IPC Server handler 10 on 60020'
> >> 2013-02-01 15:44:55,087 INFO
> >> org.apache.hadoop.hbase.regionserver.HRegion:
> >> Finished memstore flush of ~1.5g/1624094576, currentsize=0.0/0 for
> region
> >>
> >>
> test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158.
> >> in 5635ms, sequenceid=58642, compaction requested=false
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://apache-hbase.679495.n3.nabble.com/MemStoreFlusher-region-has-too-many-store-files-client-timeout-tp4037887.html
> >> Sent from the HBase User mailing list archive at Nabble.com.
> >>
> >
> >
> >
> > --
> > Kevin O'Dell
> > Customer Operations Engineer, Cloudera
>
>
>
>
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/MemStoreFlusher-region-has-too-many-store-files-client-timeout-tp4037887p4037949.html
> Sent from the HBase User mailing list archive at Nabble.com.
>


-- 
Kevin O'Dell
Customer Operations Engineer, Cloudera

--e89a8f22c6f530692004d4faeaff--