Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@locus.apache.org Received: (qmail 90310 invoked from network); 19 Dec 2008 20:46:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 19 Dec 2008 20:46:49 -0000 Received: (qmail 76000 invoked by uid 500); 19 Dec 2008 20:46:48 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 75972 invoked by uid 500); 19 Dec 2008 20:46:48 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 75961 invoked by uid 99); 19 Dec 2008 20:46:48 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Dec 2008 12:46:48 -0800 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jdcryans@gmail.com designates 74.125.92.24 as permitted sender) Received: from [74.125.92.24] (HELO qw-out-2122.google.com) (74.125.92.24) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Dec 2008 20:46:39 +0000 Received: by qw-out-2122.google.com with SMTP id 3so276112qwe.35 for ; Fri, 19 Dec 2008 12:46:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender :to:subject:in-reply-to:mime-version:content-type:references :x-google-sender-auth; bh=qUjchJpZqeQwPsxbFJiNEKAv5/UCc6GS9/jqzNdxYRQ=; b=x7AOpUlWScHJ5ds5ZVaN/VeVFnj7jvJmpnlPkNspyBo0z/n14b5YENBGR2/W1ZROCD QHjMpGqCEMLP1BlIEQBsXt94ITTLeiWPAgjsUQdk5kxJLgcyAUs5n9XBA3/iCCNk0LRP qvkvsWgTsC/2jSFKDBFURK+IBVYJ/GXquBOo8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:in-reply-to:mime-version :content-type:references:x-google-sender-auth; b=q3U5lIIZUMLO8P+tQvTt4YKbr84vC0+xRP99AdomqnityJUrVYnvR+yKGly6MMcn2j IYr+PgWsXboM47nioxZnq2xB9fY4VuAnQxHr/VyKRkw4b1ydz+AB+oQ2ms1vOTE5ZDui qz44/rWWigucysFCYh3EvWDtrOWD9nMgzr3uQ= Received: by 10.214.81.18 with SMTP id e18mr4646545qab.67.1229719578671; Fri, 19 Dec 2008 12:46:18 -0800 (PST) Received: by 10.215.38.14 with HTTP; Fri, 19 Dec 2008 12:46:18 -0800 (PST) Message-ID: <31a243e70812191246h591fbf0eq6f7fdcfce68fc057@mail.gmail.com> Date: Fri, 19 Dec 2008 15:46:18 -0500 From: "Jean-Daniel Cryans" Sender: jdcryans@gmail.com To: hbase-user@hadoop.apache.org Subject: Re: HBase behaviour at startup (compression) In-Reply-To: <494BFE77.9050604@duboce.net> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_31983_13194544.1229719578664" References: <21051218.post@talk.nabble.com> <49497D94.5060608@duboce.net> <21094878.post@talk.nabble.com> <494BFE77.9050604@duboce.net> X-Google-Sender-Auth: c2ceb030c45893c8 X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_31983_13194544.1229719578664 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline I think we should cut a 0.18.2. it also contains a backport of 1046. Jean-Adrien, regards your problem, you also have to take into account the hbase.regionserver.thread.splitcompact.check.frequency. When a region opens, as we know, a compaction is requested for it. The class responsible for checking that is CompactSplitThread and, by default, it starts a compaction every 20 seconds. So even if a compaction takes 0 sec, you still lose 20 seconds for each and every one of them. In your particular situation, given the fact that you have 250 regions per region server, it's easy to understand why it may be long. Now to fix that, you could have a third region server on your machine that only has 1 datanode. You could also lower the value of the config I wrote about to maybe 10 seconds. J-D On Fri, Dec 19, 2008 at 3:05 PM, stack wrote: > Jean-Adrien wrote: > >> Hello, >> >> Andrew and St.ack, Thanks for your answers, and excuse me for confusion >> between compression and compaction... >> >> I reviewed the concept of major / minor compaction in the wiki and I >> looked >> at both jira cases HBASE-938 / HBASE-1062. >> Since I'm running hbase version 0.18.0 I certainly have the problem of >> HBASE-938. If I understand well the problem, it is that at startup, all >> opened regions that need compaction make a major compaction since the >> timestamp of the latest major is not stored anywhere, so the (in memory) >> counter is reset to the startup time, and the next major compaction will >> take place (with default config) 1 day later. >> >> > > I say that a major compaction runs on every restart in HBASE-938 but I was > incorrect. Later in the issue, I recant having studied the code (The 'last' > major compaction timestamp is that of the oldest file in the filesystem). > > Later in hbase-938, we hone in on the fact that even in case where the last > compaction was a major compaction, if the major compaction interval elapses, > we'd run a new major compaction. Essentially we'd rewrite data in hbase on > a period (As you 'prove' later in this message w/ your replication check > (S). > > Can you tell what is running on restart? Is it a major compaction? Or add > logs of startup to an issue and I'll take a look. In 0.18.x, there is the > below if a 'major': > > > LOG.debug("Major compaction triggered on store: " + > this.storeNameStr + > ". Time since last major compaction: " + > ((System.currentTimeMillis() - lowTimestamp)/1000) + " > seconds"); > > The thing I'm not clear on is why on restart all the compacting? Why is a > 'major' compaction triggered if we're looking at timestamp of oldest file in > filesystem. Perhaps you can add some debug emissions to figure it > Jean-Adrien? > ... > >> Here can be my problem during major compaction: >> I think, (I'm not sure, I have to find better tool to monitor my network) >> with my light configuration (see above for details), the problem is that >> even if the compaction process is quick, for example a single modification >> in a cell yield to a major compaction rewriting the whole file, since my >> regionservers run on the same machine than the datanodes, they communicate >> directly (fast) when RS ask to store a mapfile to DN. >> Then the datanode will place replicas of the blocks on the 2 others >> datanodes through the slow 100Mbit/s network. At HBase startup time, if >> hadoop asks the network to transfer about 200Gb the bandwidth might be >> saturated. The lease expires and the RS shut themself done. That could >> explain as well the problem of max Xcievers reached sometime in the >> datanodes that we disscussed in a previous post. >> >> >> > Above sounds plausible. > > Should we cut a 0.18.2 with hbase-938 backported (includes other good fixes > too -- hbase-998, etc.). > > St.Ack > ------=_Part_31983_13194544.1229719578664--