Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 52394 invoked from network); 9 Mar 2010 02:09:19 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 9 Mar 2010 02:09:19 -0000 Received: (qmail 77354 invoked by uid 500); 9 Mar 2010 02:08:53 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 77323 invoked by uid 500); 9 Mar 2010 02:08:53 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 77315 invoked by uid 99); 9 Mar 2010 02:08:53 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Mar 2010 02:08:53 +0000 X-ASF-Spam-Status: No, hits=4.4 required=10.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of steven.zhuang.1984@gmail.com designates 209.85.222.195 as permitted sender) Received: from [209.85.222.195] (HELO mail-pz0-f195.google.com) (209.85.222.195) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Mar 2010 02:08:51 +0000 Received: by pzk33 with SMTP id 33so1077802pzk.5 for ; Mon, 08 Mar 2010 18:08:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=GHmEg+BuuUCIv5Rv+DZEQIE/CTb6ukCoEOklujbzNBw=; b=OouARaywE0NA9rors79KFX3B8Urm1DMldsIs9VgrGAhuWX/hCUbHts5epvWQw9w4Lr MaD2rQKYbguHrhhFvOeSP+7xtXpLjr6XXr40L+Nw+4CuwsREJH8wUhfUHGUIOuRifCas umSHOslWbMFEZiyCZfBWPsyAegb7KI8xMaq44= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=Bby4I7lpmnay+g6tVSO6cn1IWQuQX1uCs4BB+2N0rmpxEbJ+To+yf/Q8DsdgK+hlbV 9P8+F3iIBNYNOScBg8Vb9O4WZtkb/BMx8VLKfW59TCsu0eR2mXtYItBu17cu0HDfBijB 4OhIUAyHV3KPfyI09ylDkFFPATrabDRGb4cdI= MIME-Version: 1.0 Received: by 10.142.209.9 with SMTP id h9mr3618587wfg.14.1268100508942; Mon, 08 Mar 2010 18:08:28 -0800 (PST) In-Reply-To: <78568af11003081714p29078c4dn94a5ed2c0dfd2af3@mail.gmail.com> References: <6ba3573d1003041753g5eb20dbay24d1877c93d7003c@mail.gmail.com> <31a243e71003041800o2e023c96v4a0450d5a5197293@mail.gmail.com> <6ba3573d1003041919t473a9b49l90f89cba2457ecbd@mail.gmail.com> <31a243e71003051015j1a57850x8ca07a38ee8cba20@mail.gmail.com> <6ba3573d1003062125x504c1f10sc489d81fa884ac37@mail.gmail.com> <31a243e71003081010p6cbb9913sc9d9fe1f9c9cff84@mail.gmail.com> <6ba3573d1003081647k1843cc80hb6693ec0858c8fbd@mail.gmail.com> <78568af11003081653n4201f5a0qe0d60cbef175d7c8@mail.gmail.com> <6ba3573d1003081712s4ed0ea80m656346dffb22147f@mail.gmail.com> <78568af11003081714p29078c4dn94a5ed2c0dfd2af3@mail.gmail.com> Date: Tue, 9 Mar 2010 10:08:28 +0800 Message-ID: <6ba3573d1003081808y74f8b2faua5da740b0d98c626@mail.gmail.com> Subject: Re: regionserver loads but never unload? From: steven zhuang To: hbase-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=000e0cd32c166000fa048154a815 --000e0cd32c166000fa048154a815 Content-Type: text/plain; charset=ISO-8859-1 and a store file can be really big, if it is from a region with only one really big row. :) On Tue, Mar 9, 2010 at 9:14 AM, Ryan Rawson wrote: > Hi, > > Sorry, N should really be "K" - the number of store files being > compacted. So it is not dependent on the size of your data set. > > -ryan > > On Mon, Mar 8, 2010 at 5:12 PM, steven zhuang > wrote: > > about the math you did, I think in my case the "N" is really big, my > largest > > cell will not exceed 100 Bytes. > > I am sure the regionserver crashed when it did a major compaction before > I > > haven't split the big row into smaller ones. > > > > > > On Tue, Mar 9, 2010 at 8:53 AM, Ryan Rawson wrote: > > > >> Hi, > >> > >> HBase does not load the "entire region" into ram during region load. > >> What it does is load indexes from the storage files. These indexes > >> are typically a meg or two per region. After a region is loaded, > >> there is no follow up loads. During the course of answering queries, > >> as blocks from the files are needed, they are loaded into the block > >> cache. At this point they persist until the LRU mechanism decides to > >> evict them. > >> > >> During compactions, we do a heap sorted merge of multiple HFiles into > >> one. The largest memory use during this would be either the greater > >> of N*block size (default=64k) or 2*(Largest Cell Size). Thus if one > >> of your cells is 400MB then we would require at least 800MB to compact > >> such a file. > >> > >> -ryan > >> > >> On Mon, Mar 8, 2010 at 4:47 PM, steven zhuang > >> wrote: > >> > thanks, J.D. > >> > that's already done after I noticed there are some really huge rows. > now > >> the > >> > updates and writes can be done smoothly, I kept every row with 300K > cells > >> or > >> > less. > >> > > >> > I am still not clear about how Hbase manage the region, one thing most > >> > curious is that will it load a whole region into memory when there is > >> some > >> > read/write/compaction related to the region. i'm checking the code, > but > >> it > >> > really helps if I could get an answer from you guys. > >> > > >> > > >> > On Tue, Mar 9, 2010 at 2:10 AM, Jean-Daniel Cryans < > jdcryans@apache.org > >> >wrote: > >> > > >> >> You should consider modeling your rows so that they are smaller than > >> >> 1.5GB, the sweet spot for HBase is more like a few KBs per row. Else > >> >> you end up with only 1 row per region which is totally inefficient > for > >> >> obvious reasons once you understand how HBase manages them. > >> >> > >> >> The length is the size of the file in bytes. > >> >> > >> >> J-D > >> >> > >> >> On Sat, Mar 6, 2010 at 9:25 PM, steven zhuang > >> >> wrote: > >> >> > thanks, J.D. > >> >> > > >> >> > I think I know why the regionserver takes so much memory > now, > >> >> > there are some really big row in my table, 1.2-1.5 GB in size. > seems > >> that > >> >> > the regionserver sometime will try to load the whole region into > >> memory, > >> >> I > >> >> > don't know when this will happen, maybe when it does a major > >> compaction > >> >> or > >> >> > reassign the region to other regionserver or when it's asked to > >> >> open/online > >> >> > a region?. > >> >> > > >> >> > you question is answered in line. > >> >> > > >> >> > On Sat, Mar 6, 2010 at 2:15 AM, Jean-Daniel Cryans < > >> jdcryans@apache.org > >> >> >wrote: > >> >> > > >> >> >> On Thu, Mar 4, 2010 at 7:19 PM, steven zhuang > >> >> >> wrote: > >> >> >> > thanks, J.D. > >> >> >> > > >> >> >> > I am still not sure about the second question, from > >> the > >> >> log > >> >> >> I > >> >> >> > can see lines like: > >> >> >> > *org.apache.hadoop.hbase.regionserver.Store: loaded > >> >> >> > /user/ccenterq/hbase/XXX/1702600912/queries/1289015788537930719, > >> >> >> > isReference=false, sequence id=1389720128, length=**175533391**, > >> >> >> > majorCompaction=true (this is the region data, not the index, > >> right?)* > >> >> >> > I do have some region really big, with millions of > >> >> columns > >> >> >> in > >> >> >> > one column family, but isn't this length a little too big. > >> >> >> > >> >> >> The index and the metadata of the files of that Store in that > region > >> >> >> was loaded here. > >> >> >> > >> >> >> > > >> >> >> > > >> >> >> > About the third one, I am actually not very clear of how > >> memory > >> >> is > >> >> >> > used in Hbase, if it's only the few KBs by holding region info, > it > >> >> won't > >> >> >> > release right? > >> >> >> > >> >> >> I don't understand your question. Try an example? > >> >> > > >> >> > > >> >> > sorry for not be clear, actually I am asking which part of the > region > >> has > >> >> a > >> >> > length of "175533391" in the following line, I think the > >> index/meta-data > >> >> > info for a region won't take so much memory. > >> >> > > >> >> > > *org.apache.hadoop.hbase.regionserver.Store: loaded > >> >> >> /user/ccenterq/hbase/XXX/1702600912/queries/1289015788537930719, > >> >> >> isReference=false, sequence id=1389720128, length=**175533391**, > >> >> >> majorCompaction=true (this is the region data, not the index, > >> right?)* > >> >> > >> > > >> > > > --000e0cd32c166000fa048154a815--