hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Himanshu Vashishtha <hv.cs...@gmail.com>
Subject Re: Uneven write request to regions
Date Wed, 20 Nov 2013 01:05:58 GMT
Re: "The 32 limit makes HBase go into
stress mode, and dump all involving regions contains in those 32 WAL Files."

Pardon, I haven't read all your data points/details thoroughly, but the
above statement is not true. Rather, it looks at the oldest WAL file, and
flushes those regions which would free that WAL file.

But I agree that in general with this kind of workload, we should handle
WAL files more intelligently and free up those WAL files which don't have
any dependency (that is, all their entries are already flushed) when
archiving. We do that in trunk but not in any released version, though.



On Sat, Nov 16, 2013 at 11:16 AM, Asaf Mesika <asaf.mesika@gmail.com> wrote:

> First I forgot to mention that <customerId> in our case is
> MD5(<customerId>).
> In our case, we have so much data flowing in, that we end up having a
> region per <customerId><bucket> pretty quickly and even that, is splitted
> into different regions by specific date duration (timestamp).
>
> We're not witnessing a hotspot issue. I built some scripts in java and awk,
> and saw that 66% of our customers use more than 1Rs.
>
> We have two main serious issues: primary and secondary.
>
> Our primary issue being the slow-region vs fast-region. First let's be
> reminded that a region represents as I detailed before a specific
> <customerId><bucket>. Some customers gets x50 times more data that other
> customers at a specific time frame (2hrs - 1 day). So in a one RS, we have
> regions getting 10 write requests per hour, vs 50k write requests per hour.
> So the region mapped to the slow-filling customer id, doesn't get to the
> 256MB flush limit and hence isn't flushed, while the regions mapped to the
> fast-filling customer id, are flushing very quickly since they are filling
> very quickly.
> Let's say the 1st WAL file contains the put of a slow-filling customerId.
> the fast-filling customerId, fills up the rest of that file. After 20-30
> seconds, the file gets rolled, and another file fills up with fast filling
> customerId. After a while, we get to 32 WAL Files. The 1st file wasn't
> deleted since its region wasn't flushed. The 32 limit makes HBase go into
> stress mode, and dump all involving regions contains in those 32 WAL Files.
> In our case, we saw that it flushes 111 regions. Lots of the store files
> are 3k-3mb sized. So our compaction queue start filling up with those store
> files needs to be compacted.
> At the of the road, the RS gets dead.
>
> Our secondary issue is those of empty regions - we get to a situation where
> a region is mapped to a specific <customerId>, <bucket>, and date range
> (1/7 - 3/7). Those when we are in August (we TTL set to 30 days), those
> regions gets empty and will never get filled again.
> We assume this somehow wrecks havoc in the load balancer, and also MSLAB
> probably steals 1-2 GB of memory for those empty regions.
>
> Thanks!
>
>
>
> On Sat, Nov 16, 2013 at 7:25 PM, Mike Axiak <mike@axiak.net> wrote:
>
> > Hi,
> >
> > One new key pattern that we're starting to use is a salt based on a
> shard.
> > For example, let's take your key:
> >
> >   <customerId><bucket><timestampInMs><uniqueId>
> >
> > Consider a shard between 0 and 15 inclusive. We determine this with:
> >
> >  <shard> = abs(hash32(uniqueId) % 16)
> >
> > We can then define a salt to be based on customerId and the shard:
> >
> >  <salt> = hash32(<shard><customerId>)
> >
> > So then the new key becomes:
> >
> >  <salt><customerId><timestampInMs><uniqueId>
> >
> > This will distribute the data for a given customer across the N shards
> that
> > you pick, while having a deterministic function for a given row key (so
> > long as the # of shards you pick is fixed, otherwise you can migrate the
> > data). Placing the bucket after the customerId doesn't help distribute
> the
> > single customer's data at all. Furthermore, by using a separate hash
> > (instead of just <shard><customerId>),  you're guaranteeing that new
data
> > will appear in a somewhat random location (i.e., solving the problem of
> > adding a bunch of new data for a new customer).
> >
> > I have a key simulation script in python that I can start tweaking and
> > share with people if they'd like.
> >
> > Hope this helps,
> > Mike
> >
> >
> > On Sat, Nov 16, 2013 at 1:16 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> > > bq. all regions of that customer
> > >
> > > Since the rowkey starts with <customerId>, any single customer would
> only
> > > span few regions (normally 1 region), right ?
> > >
> > >
> > > On Fri, Nov 15, 2013 at 9:56 PM, Asaf Mesika <asaf.mesika@gmail.com>
> > > wrote:
> > >
> > > > But when you read, you have to approach all regions of that customer,
> > > > instead of pinpointing just one which contains that hour you want for
> > > > example.
> > > >
> > > > On Friday, November 15, 2013, Ted Yu wrote:
> > > >
> > > > > bq. you must have your customerId, timestamp in the rowkey since
> you
> > > > query
> > > > > on it
> > > > >
> > > > > Have you looked at this API in Scan ?
> > > > >
> > > > >   public Scan setTimeRange(long minStamp, long maxStamp)
> > > > >
> > > > >
> > > > > Cheers
> > > > >
> > > > >
> > > > > On Fri, Nov 15, 2013 at 1:28 PM, Asaf Mesika <
> asaf.mesika@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > The problem is that I do know my rowkey design, and it follows
> > > people's
> > > > > > best practice, but generates a really bad use case which I can't
> > seem
> > > > to
> > > > > > know how to solve yet.
> > > > > >
> > > > > > The rowkey as I said earlier is:
> > > > > > <customerId><bucket><timestampInMs><uniqueId>
> > > > > > So when ,for example, you have 1000 customer, and bucket ranges
> > from
> > > 1
> > > > to
> > > > > > 16, you eventually end up with:
> > > > > > * 30k regions - What happens, as I presume: you start with one
> > region
> > > > > > hosting ALL customers, which is just one. As you pour in more
> > > customers
> > > > > and
> > > > > > more data, the region splitting kicks in. So, after a while,
you
> > get
> > > > to a
> > > > > > situation in which most regions hosts a specific customerId,
> bucket
> > > and
> > > > > > time duration. For example: customer #10001, bucket 6, 01/07/2013
> > > > 00:00 -
> > > > > > 02/07/2013 17:00.
> > > > > > * Empty regions - the first really bad consequence of what I
told
> > > > before
> > > > > is
> > > > > > that when the time duration is over, no data will ever be written
> > to
> > > > this
> > > > > > region. and Worst - when the TTL you set (lets say 1 month)
is
> over
> > > and
> > > > > > it's 03/08/2013, this region gets empty!
> > > > > >
> > > > > > The thing is that you must have your customerId, timestamp in
the
> > > > rowkey
> > > > > > since you query on it, but when you do, you will essentially
get
> > > > regions
> > > > > > which will not get any more writes to them, and after TTL become
> > > zombie
> > > > > > regions :)
> > > > > >
> > > > > > The second bad part of this rowkey design is that some customer
> > will
> > > > have
> > > > > > significantly less traffic than other customers, thus in essence
> > > their
> > > > > > regions will get written in a very slow rate compared with the
> high
> > > > > traffic
> > > > > > customer. When this happens on the same RS - bam: the slow region
> > > Puts
> > > > > are
> > > > > > causing the WAL Queue to get bigger over time, since its region
> > never
> > > > > gets
> > > > > > to Max Region Size (256MB in our case) thus never gets flushed,
> > thus
> > > > > stays
> > > > > > in the 1st WAL file. Until when? Until we hit max logs file
> > permitted
> > > > > (32)
> > > > > > and then regions are flushed forcely. When this happen, we get
> > about
> > > > 100
> > > > > > regions with 3k-3mb store files. You can imagine what happens
> next.
> > > > > >
> > > > > > The weirdest thing here is that this rowkey design is very
> common -
> > > > > nothing
> > > > > > fancy here, so in essence this phenomenon should have happened
> to a
> > > lot
> > > > > of
> > > > > > people - but from some reason, I don't see that much writing
> about
> > > it.
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > Asaf
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Nov 15, 2013 at 3:51 AM, Jia Wang <ramon@appannie.com>
> > > wrote:
> > > > > >
> > > > > > > Then the case is simple, as i said "check your row key
design,
> > you
> > > > can
> > > > > > find
> > > > > > > the start and end row key for each region, from which you
can
> > know
> > > > why
> > > > > > your
> > > > > > > request with a specific row key doesn't hit a specified
region"
> > > > > > >
> > > > > > > Cheers
> > > > > > > Ramon
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Nov 14, 2013 at 8:47 PM, Asaf Mesika <
> > > asaf.mesika@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > It's from the same table.
> > > > > > > > The thing is that some <customerId> simply have
less data
> saved
> > > in
> > > > > > HBase,
> > > > > > > > while others have x50 (max) data.
> > > > > > > > I'm trying to check how people designed their rowkey
around
> it,
> > > or
> > > > > had
> > > > > > > > other out-of-the-box solution for it.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Nov 14, 2013 at 12:06 PM, Jia Wang <
> ramon@appannie.com
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi
> > > > > > > > >
> > > > > > > > > Are the regions from the same table? If it was,
check your
> > row
> > > > key
> > > > > > > > design,
> > > > > > > > > you can find the start and end row key for each
region,
> from
> > > > which
> > > > > > you
> > > > > > > > can
> > > > > > > > > know why your request with a specific row key
doesn't hit a
> > > > > specified
> > > > > > > > > region.
> > > > > > > > >
> > > > > > > > > If the regions are for different table, you may
consider to
> > > > combine
> > > > > > > some
> > > > > > > > > cold regions for some tables.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > > Ramon
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Thu, Nov 14, 2013 at 4:59 PM, Asaf Mesika
<
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message