hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dhruba Borthakur <dhr...@gmail.com>
Subject Re: Smaller Region Size?
Date Thu, 24 Dec 2009 16:17:33 GMT
Hi folks,

Is it necessary to run keep the clocks synchronized on all Hbase region
servers/master? I would appreciate it a lot if somebody can please explain
if the HBase architecture depends on this fact.

thanks,
dhruba


On Wed, Dec 23, 2009 at 9:57 AM, Mark Vigeant
<mark.vigeant@riskmetrics.com>wrote:

> The clocks are all running in sync, though I am not using NTP shamefully. I
> should.
>
> And no, I listed the errors backwards, that's not how they showed up in the
> log, sorry, heh. I don't think they run backwards.
>
> -----Original Message-----
> From: Andrew Purtell [mailto:apurtell@apache.org]
> Sent: Wednesday, December 23, 2009 12:47 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Smaller Region Size?
>
> How do you have clocks set up on your systems Mark? Are you using NTP to
> keep
> them sane? Am I correct that they are sometimes running backward?
>
>
>   - Andy
>
>
>
> ----- Original Message ----
> > From: Mark Vigeant <mark.vigeant@riskmetrics.com>
> > To: "hbase-user@hadoop.apache.org" <hbase-user@hadoop.apache.org>
> > Sent: Wed, December 23, 2009 9:09:04 AM
> > Subject: RE: Smaller Region Size?
> >
> > > The biggest legitimate reason to run smaller region size is if your
> > > data set is small (lets say 400mb) but highly accessed, so you want a
> > > good spread of regions across your cluster.
> >
> > That's exactly it, my input dataset was 500MB total (~1,000,000 rows) and
> it was
> > getting stored as just one region on one regionserver.
> >
> > In response to St. Ack, I don't think my regions are performing too many
> splits:
> > the regionserver logs mainly consist of the occasional ZooKeeper
> Connection
> > error and these two repeatedly:
> >
> > 2009-12-22 15:21:50,415 DEBUG
> org.apache.hadoop.hbase.io.hfile.LruBlockCache:
> > Cache Stats: Sizes: Total=6.556961MB (6875472), Free=792.61804MB
> (831120240),
> > Max=799.175MB (837995712), Counts: Blocks=0, Access=25755, Hit=0,
> Miss=25755,
> > Evictions=0, Evicted=0, Ratios: Hit Ratio=0.0%, Miss Ratio=100.0%,
> > Evicted/Run=NaN
> >
> > 2009-12-22 15:20:35,073 DEBUG org.apache.hadoop.hbase.regionserver.Store:
> > Skipping major compaction of Message because one (major) compacted file
> only and
> > elapsedTime 339624149ms is < ttl=9223372036854775807
> >
> > You're suggesting the performance would be improved if the dataset was
> larger?
> > What are other parameters that can be fine-tuned to optimize based off
> data
> > size?
> >
> > Thanks
> > -Mark
> > -----Original Message-----
> > From: Ryan Rawson [mailto:ryanobjc@gmail.com]
> > Sent: Tuesday, December 22, 2009 11:28 PM
> > To: hbase-user@hadoop.apache.org
> > Subject: Re: Smaller Region Size?
> >
> > The biggest legitimate reason to run smaller region size is if your
> > data set is small (lets say 400mb) but highly accessed, so you want a
> > good spread of regions across your cluster.
> >
> > Another is to run a larger region if you are having a huge table and
> > you want to keep absolute region count low. I am not 100% sold on this
> > yet.
> >
> > I have a patch that can keep performance high during a highly split
> > table, by using parallel puts. This has been proven to keep aggregate
> > performance really high, and I hope it will make 0.20.3.
> >
> > On Tue, Dec 22, 2009 at 2:31 PM, stack wrote:
> > > On Tue, Dec 22, 2009 at 8:57 AM, Mark Vigeant
> > > wrote:
> > >
> > >> J-D,
> > >>
> > >> I noticed that performance for uploading data into tables got a lot
> better
> > >> as I lowered the max file size -- but up until a certain point, where
> the
> > >> performance began slowing down again.
> > >>
> > >>
> > > Tell us more.  What kinda size changes did you make?  How many regions
> were
> > > created?  Is the slow down because table is splitting all the time?  If
> you
> > > study regionserver logs, can you make out what the regionservers are
> > > spending their times doing?
> > >
> > >
> > >
> > >> Is there a rule of thumb/formula/notion to rely on when setting this
> > >> parameter for optimal performance? Thanks!
> > >>
> > >>
> > > We have most experience running defaults.  Generally folks go up from
> the
> > > default size because they want to host more data in about same number
> or
> > > regions.  Going down from the default I've not seen much of.
> > >
> > > St.Ack
> > >
> >
> > This email message and any attachments are for the sole use of the
> intended
> > recipients and may contain proprietary and/or confidential information
> which may
> > be privileged or otherwise protected from disclosure. Any unauthorized
> review,
> > use, disclosure or distribution is prohibited. If you are not an intended
> > recipient, please contact the sender by reply email and destroy the
> original
> > message and any copies of the message as well as any attachments to the
> original
> > message.
>
>
>
>
>
>
> This email message and any attachments are for the sole use of the intended
> recipients and may contain proprietary and/or confidential information which
> may be privileged or otherwise protected from disclosure. Any unauthorized
> review, use, disclosure or distribution is prohibited. If you are not an
> intended recipient, please contact the sender by reply email and destroy the
> original message and any copies of the message as well as any attachments to
> the original message.
>



-- 
Connect to me at http://www.facebook.com/dhruba

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message