Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A80E6E652 for ; Tue, 5 Feb 2013 14:21:24 +0000 (UTC) Received: (qmail 47236 invoked by uid 500); 5 Feb 2013 14:21:22 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 47104 invoked by uid 500); 5 Feb 2013 14:21:22 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 47072 invoked by uid 99); 5 Feb 2013 14:21:21 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Feb 2013 14:21:21 +0000 X-ASF-Spam-Status: No, hits=2.8 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,URI_HEX,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of kevin.odell@cloudera.com designates 209.85.215.54 as permitted sender) Received: from [209.85.215.54] (HELO mail-la0-f54.google.com) (209.85.215.54) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Feb 2013 14:21:04 +0000 Received: by mail-la0-f54.google.com with SMTP id gw10so200210lab.27 for ; Tue, 05 Feb 2013 06:20:43 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type:x-gm-message-state; bh=lnbt7Tc4eLKeTJ6wEDD3OV0OvrOv1iFxQorFA97cUVY=; b=XtEQrypMdbMJuQu0ieZ5oEZaZx0BBIatkqH3zWcgtWzt33mg9NVrvVGEomsF+v5dJQ T/n+9MAPDWh3GLPCOrNSdM/XVoqGttv0Jg/kfGgaS8Ka5Jgs0qPSYJeFbK0108vgEE82 c4GI8Pvv/8iEN9SEZoXI/be7lMtRUEwhQ4QjoqibPLo1+uvHcsYXsnoJbAmhkMj4pZyg 17xRAqkZYPCGyzdCN3eDtOQq7B+vV6ur3xRrnEhlyk2bXmV1p08u/s6mQ7B6YAwFNOo1 p8gaG2PN4VgRjeuKUqo0Ah+4YGXVT+79mNp/M9r0LtdAWafQKNedsHtloHMUdr0e8GDt 9unQ== MIME-Version: 1.0 X-Received: by 10.152.144.138 with SMTP id sm10mr23032102lab.53.1360074042796; Tue, 05 Feb 2013 06:20:42 -0800 (PST) Received: by 10.112.76.129 with HTTP; Tue, 5 Feb 2013 06:20:42 -0800 (PST) In-Reply-To: <1360053550728-4037949.post@n3.nabble.com> References: <1359981948821-4037887.post@n3.nabble.com> <1360053550728-4037949.post@n3.nabble.com> Date: Tue, 5 Feb 2013 09:20:42 -0500 Message-ID: Subject: Re: MemStoreFlusher: region has too many store files (client timeout) From: "Kevin O'dell" To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=e89a8f22c6f530692004d4faeaff X-Gm-Message-State: ALoCoQmcoh5FuUdSe9w9f5VYvES6P+W+sNeId4ZVwSMhxH8Mx1HTV+cx7slM1Jn+xbk2LAl36aN6 X-Virus-Checked: Checked by ClamAV on apache.org --e89a8f22c6f530692004d4faeaff Content-Type: text/plain; charset=ISO-8859-1 kzurek, Okay, that is a fair flush size(I have not seen larger than 256MB). Do you think you could send me a RS log from yesterday while you were seeing this issue? I would just like to thumb through it and make some tuning recommendations. I have a pretty good idea of what you need to do, if you look at: http://gbif.blogspot.com/2012/07/optimizing-writes-in-hbase.html The article has some good ideas for write tuning. On Tue, Feb 5, 2013 at 3:39 AM, kzurek wrote: > Thanks for the reply, although I should clear some misunderstandings. In > general, I do know general behavior and difference between minor and major > compaction, as well as when minor compaction might become (could be > promoted) to major compaction. I just wanted to verify influence of > compaction (mostly major) on our cluster performance. Thus, I've created > test where I'm putting data to only one region (total 71) by one single > threaded client using build in caching mechanism (according to "HBase The > Definitive Guide" book) and triggering major compaction by hand > (HBaseAdmin). Although, after few tests I've noticed that major compaction > (Large Compaction) is being triggered (cache flusher, recursive queue) so I > left it as it was (not triggering it anymore). That brought me to this > situation, where I'm putting data and after a while I'm getting timeouts on > the client, in meanwhile I see that memstore is being flush which cant > create new store file (cause there are to many of them) and which is > frequently blocked by compaction process. I hope that this short > description > will bring closer look at the issue. In addition, here are some answers to > your questions: > > 1. How often are you flushing? > I'm not triggering flushing by hand, but I've noticed that data is being > flushed every 4s (275m) or 1m 30s-40s (1.5g). > 2. How often are you force flushing from HLog rolls? > Default settings are: blocksize=64 MB, rollsize=60.8 MB, enabled=true, > optionallogflushinternal=1000ms. It seems that roll is made every hour. > 3. What size are your flushes? > Depends, from 275m up to 1.5g. I've set my memstore flush size to 256m and > memstore block multiplier to 6. Should I increase the flush size?? > 4. What does your region count look like as that can affect your flush > size? > Initial split is 37 regions on 6 RegionServers, but at the moment there are > 71 regions. > > > > Kevin O'dell wrote > > Kzurek, > > > > Just because you turn off time based major compactions, it does not mean > > that you have turned major compaction off. Compaction can still be > > promoted to be Majors. Also, the only real difference between a major > and > > minor compaction is one processes deletes. You should really schedule at > > least daily major compactions. As for your blocking issue, there are > > quite > > a few things you would want to look at: > > > > How often are you flushing? > > > > How often are you force flushing from HLog rolls? > > > > What size are your flushes? > > > > What does your region count look like as that can affect your flush size? > > > > etc > > > > When I see HBase blocking constantly it is usually a sign that you need > to > > do some fine grain tuning. > > > > On Mon, Feb 4, 2013 at 7:45 AM, kzurek < > > > kzurek@ > > > > wrote: > > > >> I'm facing some issues regarding to major compaction. I've disabled > major > >> compaction and it is not triggered manually, but when I'm loading data > to > >> selected region, I saw that major compaction queue is growing and it is > >> being triggered ('Large Compaction' in logs) quite often (mainly due to > >> cacheFlusher). Moreover, I've noticed that my clients app gets timeout > >> while > >> putting data into the cluster (happens when memory store flusher is > >> trying > >> to dump memory content to store file, but it cannot due to too many > store > >> files), also drop in data rate, which in this case is obvious, is > >> noticeable. For me, it looks like compaction process is not fast enough > >> comparing to incoming rate of data or ... maybe something else?? and > >> finally > >> is blocking the update process. > >> > >> Basic information: > >> HBase Version: 0.92.1, r1298924 > >> Hadoop Version: 1.0.3, r1335192 > >> > >> Logs: > >> 2013-02-01 15:43:14,627 DEBUG > >> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Large > Compaction > >> requested: > >> > >> > regionName=test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158., > >> storeName=data, fileCount=3, fileSize=478.3m (249.8m, 113.7m, 114.7m), > >> priority=-3, time=1051078047102762; Because: > >> regionserver60020.cacheFlusher; > >> compaction_queue=(48:0), split_queue=0 > >> 2013-02-01 15:43:14,627 DEBUG > org.apache.hadoop.hbase.regionserver.Store: > >> 3b710693d6314c2a987b07dd82451158 - tag: no store files to compact > >> 2013-02-01 15:43:14,709 WARN org.apache.hadoop.ipc.HBaseServer: > >> (responseTooSlow): > >> > >> > {"processingtimems":61908,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@57a56081 > >> ), > >> rpc version=1, client version=29, > >> methodsFingerPrint=54742778","client":"192.168.1.68:49419 > >> > ","starttimems":1359729732799,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"} > >> 2013-02-01 15:43:14,710 WARN org.apache.hadoop.ipc.HBaseServer: IPC > >> Server > >> Responder, call > multi(org.apache.hadoop.hbase.client.MultiAction@57a56081 > >> ), > >> rpc version=1, client version=29, methodsFingerPrint=54742778 from > >> 192.168.1.68:49419: output error > >> 2013-02-01 15:43:14,710 WARN org.apache.hadoop.ipc.HBaseServer: IPC > >> Server > >> handler 3 on 60020 caught: java.nio.channels.ClosedChannelException > >> at > >> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133) > >> at > sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324) > >> at > >> org.apache.hadoop.hbase.ipc.HBaseServer.channelIO(HBaseServer.java:1710) > >> at > >> > org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1653) > >> at > >> > >> > org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:924) > >> at > >> > >> > org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:1003) > >> at > >> > >> > org.apache.hadoop.hbase.ipc.HBaseServer$Call.sendResponseIfReady(HBaseServer.java:409) > >> at > >> > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1346) > >> > >> 2013-02-01 15:43:19,397 DEBUG > >> org.apache.hadoop.hbase.regionserver.HRegion: > >> Flush requested on > >> > >> > test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158. > >> 2013-02-01 15:43:19,397 WARN > >> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Region > >> > >> > test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158. > >> has too many store files; delaying flush up to 90000ms > >> 2013-02-01 15:43:19,397 DEBUG > org.apache.hadoop.hbase.regionserver.Store: > >> 3b710693d6314c2a987b07dd82451158 - data: no store files to compact > >> 2013-02-01 15:43:19,397 DEBUG > org.apache.hadoop.hbase.regionserver.Store: > >> 3b710693d6314c2a987b07dd82451158 - tag: no store files to compact > >> 2013-02-01 15:43:55,693 INFO > >> org.apache.hadoop.hbase.regionserver.HRegion: > >> Blocking updates for 'IPC Server handler 10 on 60020' on region > >> > >> > test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158.: > >> memstore size 1.5g is >= than blocking 1.5g size > >> 2013-02-01 15:44:49,452 INFO > >> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Waited 90055ms on > a > >> compaction to clean up 'too many store files'; waited long enough... > >> proceeding with flush of > >> > >> > test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158. > >> 2013-02-01 15:44:49,452 DEBUG > >> org.apache.hadoop.hbase.regionserver.HRegion: > >> Started memstore flush for > >> > >> > test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158., > >> current region memstore size 1.5g > >> 2013-02-01 15:44:49,453 DEBUG > >> org.apache.hadoop.hbase.regionserver.HRegion: > >> Finished snapshotting > >> > >> > test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158., > >> commencing wait for mvcc, flushsize=1624094576 > >> 2013-02-01 15:44:50,014 DEBUG > >> org.apache.hadoop.hbase.io.hfile.HFileWriterV2: Initialized with > >> CacheConfig:enabled [cacheDataOnRead=true] [cacheDataOnWrite=false] > >> [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false] > >> [cacheEvictOnClose=false] [cacheCompressed=false] > >> 2013-02-01 15:44:55,082 DEBUG > org.apache.hadoop.hbase.regionserver.Store: > >> Renaming flushed file at > >> > >> > hdfs://SOME-PC:9000/hbase/test/3b710693d6314c2a987b07dd82451158/.tmp/7ddee02376664ac387eb3c786c541ed0 > >> to > >> > >> > hdfs://SOME-PC:9000/hbase/test/3b710693d6314c2a987b07dd82451158/data/7ddee02376664ac387eb3c786c541ed0 > >> 2013-02-01 15:44:55,086 INFO org.apache.hadoop.hbase.regionserver.Store: > >> Added > >> > >> > hdfs://SOME-PC:9000/hbase/test/3b710693d6314c2a987b07dd82451158/data/7ddee02376664ac387eb3c786c541ed0, > >> entries=7157100, sequenceid=58642, memsize=1.5g, filesize=113.8m > >> 2013-02-01 15:44:55,087 INFO > >> org.apache.hadoop.hbase.regionserver.HRegion: > >> Unblocking updates for region > >> > >> > test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158. > >> 'IPC Server handler 10 on 60020' > >> 2013-02-01 15:44:55,087 INFO > >> org.apache.hadoop.hbase.regionserver.HRegion: > >> Finished memstore flush of ~1.5g/1624094576, currentsize=0.0/0 for > region > >> > >> > test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158. > >> in 5635ms, sequenceid=58642, compaction requested=false > >> > >> > >> > >> -- > >> View this message in context: > >> > http://apache-hbase.679495.n3.nabble.com/MemStoreFlusher-region-has-too-many-store-files-client-timeout-tp4037887.html > >> Sent from the HBase User mailing list archive at Nabble.com. > >> > > > > > > > > -- > > Kevin O'Dell > > Customer Operations Engineer, Cloudera > > > > > > -- > View this message in context: > http://apache-hbase.679495.n3.nabble.com/MemStoreFlusher-region-has-too-many-store-files-client-timeout-tp4037887p4037949.html > Sent from the HBase User mailing list archive at Nabble.com. > -- Kevin O'Dell Customer Operations Engineer, Cloudera --e89a8f22c6f530692004d4faeaff--