Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 53516 invoked from network); 24 Dec 2009 16:18:05 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 24 Dec 2009 16:18:05 -0000 Received: (qmail 79627 invoked by uid 500); 24 Dec 2009 16:18:04 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 79527 invoked by uid 500); 24 Dec 2009 16:18:03 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 79517 invoked by uid 99); 24 Dec 2009 16:18:03 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Dec 2009 16:18:03 +0000 X-ASF-Spam-Status: No, hits=-2.2 required=5.0 tests=AWL,BAYES_00,HTML_MESSAGE X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of dhruba@gmail.com designates 209.85.210.194 as permitted sender) Received: from [209.85.210.194] (HELO mail-yx0-f194.google.com) (209.85.210.194) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Dec 2009 16:17:54 +0000 Received: by yxe32 with SMTP id 32so8240975yxe.5 for ; Thu, 24 Dec 2009 08:17:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=6/1yJcLioGOM+RzFy/+A9u+sDKbwikOtR7Mc/QmKSAM=; b=MSaL7yok13/7gTEPaa5QQpb0w6Zz+KXX26/n6HfAhk5WMdSI09bwWK+Nsz8eDqDfQo WRWboQwbHuWoQIwQsvDy/q764aqtijYR42eAK0bgiwoE8MLvheguJvR980RRugrC6rkh i3QTjK2e/fM8zTL+LJyabqK+O812E1Wir2CbI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=UUqzp3S9fvac0jPEz4eTGO4o6MAEtGmcVwlBthymMqyLLts2v5eYgFYv+/tldgA3dp 9AFQB7CCGY2ej6F0RQs3gCYUjeA4jZPXTJ3gmm3nikR5tf4DWSm1hIezEYaUJAE6Luh8 W6c4bNMhqGZSEEQXroF+GxgL4j3aRUrcBcx1g= MIME-Version: 1.0 Received: by 10.150.173.3 with SMTP id v3mr7488607ybe.52.1261671453656; Thu, 24 Dec 2009 08:17:33 -0800 (PST) In-Reply-To: <5D66A842901F8E41815AF6D27A28EC490A8E1ABDAB@Mail-Ab02.rmg-ny.com> References: <5D66A842901F8E41815AF6D27A28EC490A8E0C558D@Mail-Ab02.rmg-ny.com> <31a243e70912211258x7a6385e3o32867bfbe4d740af@mail.gmail.com> <5D66A842901F8E41815AF6D27A28EC490A8E0C55BF@Mail-Ab02.rmg-ny.com> <5D66A842901F8E41815AF6D27A28EC490A8E0C5A57@Mail-Ab02.rmg-ny.com> <7c962aed0912221231i6539fd0fif4b85ea965aa6b43@mail.gmail.com> <78568af10912222027q2210f885w67f446ed1f0292ed@mail.gmail.com> <5D66A842901F8E41815AF6D27A28EC490A8E1ABD5A@Mail-Ab02.rmg-ny.com> <729066.43477.qm@web65504.mail.ac4.yahoo.com> <5D66A842901F8E41815AF6D27A28EC490A8E1ABDAB@Mail-Ab02.rmg-ny.com> Date: Thu, 24 Dec 2009 08:17:33 -0800 Message-ID: <4aa34eb70912240817o92461av5fe66821dec2918@mail.gmail.com> Subject: Re: Smaller Region Size? From: Dhruba Borthakur To: hbase-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=000e0cd5c8c2d1873b047b7bc6d3 --000e0cd5c8c2d1873b047b7bc6d3 Content-Type: text/plain; charset=ISO-8859-1 Hi folks, Is it necessary to run keep the clocks synchronized on all Hbase region servers/master? I would appreciate it a lot if somebody can please explain if the HBase architecture depends on this fact. thanks, dhruba On Wed, Dec 23, 2009 at 9:57 AM, Mark Vigeant wrote: > The clocks are all running in sync, though I am not using NTP shamefully. I > should. > > And no, I listed the errors backwards, that's not how they showed up in the > log, sorry, heh. I don't think they run backwards. > > -----Original Message----- > From: Andrew Purtell [mailto:apurtell@apache.org] > Sent: Wednesday, December 23, 2009 12:47 PM > To: hbase-user@hadoop.apache.org > Subject: Re: Smaller Region Size? > > How do you have clocks set up on your systems Mark? Are you using NTP to > keep > them sane? Am I correct that they are sometimes running backward? > > > - Andy > > > > ----- Original Message ---- > > From: Mark Vigeant > > To: "hbase-user@hadoop.apache.org" > > Sent: Wed, December 23, 2009 9:09:04 AM > > Subject: RE: Smaller Region Size? > > > > > The biggest legitimate reason to run smaller region size is if your > > > data set is small (lets say 400mb) but highly accessed, so you want a > > > good spread of regions across your cluster. > > > > That's exactly it, my input dataset was 500MB total (~1,000,000 rows) and > it was > > getting stored as just one region on one regionserver. > > > > In response to St. Ack, I don't think my regions are performing too many > splits: > > the regionserver logs mainly consist of the occasional ZooKeeper > Connection > > error and these two repeatedly: > > > > 2009-12-22 15:21:50,415 DEBUG > org.apache.hadoop.hbase.io.hfile.LruBlockCache: > > Cache Stats: Sizes: Total=6.556961MB (6875472), Free=792.61804MB > (831120240), > > Max=799.175MB (837995712), Counts: Blocks=0, Access=25755, Hit=0, > Miss=25755, > > Evictions=0, Evicted=0, Ratios: Hit Ratio=0.0%, Miss Ratio=100.0%, > > Evicted/Run=NaN > > > > 2009-12-22 15:20:35,073 DEBUG org.apache.hadoop.hbase.regionserver.Store: > > Skipping major compaction of Message because one (major) compacted file > only and > > elapsedTime 339624149ms is < ttl=9223372036854775807 > > > > You're suggesting the performance would be improved if the dataset was > larger? > > What are other parameters that can be fine-tuned to optimize based off > data > > size? > > > > Thanks > > -Mark > > -----Original Message----- > > From: Ryan Rawson [mailto:ryanobjc@gmail.com] > > Sent: Tuesday, December 22, 2009 11:28 PM > > To: hbase-user@hadoop.apache.org > > Subject: Re: Smaller Region Size? > > > > The biggest legitimate reason to run smaller region size is if your > > data set is small (lets say 400mb) but highly accessed, so you want a > > good spread of regions across your cluster. > > > > Another is to run a larger region if you are having a huge table and > > you want to keep absolute region count low. I am not 100% sold on this > > yet. > > > > I have a patch that can keep performance high during a highly split > > table, by using parallel puts. This has been proven to keep aggregate > > performance really high, and I hope it will make 0.20.3. > > > > On Tue, Dec 22, 2009 at 2:31 PM, stack wrote: > > > On Tue, Dec 22, 2009 at 8:57 AM, Mark Vigeant > > > wrote: > > > > > >> J-D, > > >> > > >> I noticed that performance for uploading data into tables got a lot > better > > >> as I lowered the max file size -- but up until a certain point, where > the > > >> performance began slowing down again. > > >> > > >> > > > Tell us more. What kinda size changes did you make? How many regions > were > > > created? Is the slow down because table is splitting all the time? If > you > > > study regionserver logs, can you make out what the regionservers are > > > spending their times doing? > > > > > > > > > > > >> Is there a rule of thumb/formula/notion to rely on when setting this > > >> parameter for optimal performance? Thanks! > > >> > > >> > > > We have most experience running defaults. Generally folks go up from > the > > > default size because they want to host more data in about same number > or > > > regions. Going down from the default I've not seen much of. > > > > > > St.Ack > > > > > > > This email message and any attachments are for the sole use of the > intended > > recipients and may contain proprietary and/or confidential information > which may > > be privileged or otherwise protected from disclosure. Any unauthorized > review, > > use, disclosure or distribution is prohibited. If you are not an intended > > recipient, please contact the sender by reply email and destroy the > original > > message and any copies of the message as well as any attachments to the > original > > message. > > > > > > > This email message and any attachments are for the sole use of the intended > recipients and may contain proprietary and/or confidential information which > may be privileged or otherwise protected from disclosure. Any unauthorized > review, use, disclosure or distribution is prohibited. If you are not an > intended recipient, please contact the sender by reply email and destroy the > original message and any copies of the message as well as any attachments to > the original message. > -- Connect to me at http://www.facebook.com/dhruba --000e0cd5c8c2d1873b047b7bc6d3--