hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Table.get(List<Get>) overwhelms several RSs
Date Wed, 25 Feb 2015 18:33:33 GMT
bq. The 4000 keys are likely contiguous and therefore probably represent
entire regions

In that case you can convert multi-get's to Scan with proper batch size and
start/stop rows.

Cheers

On Wed, Feb 25, 2015 at 10:16 AM, Ted Tuttle <ted@mentacapital.com> wrote:

> Heaps are 16G w/ hfile.block.cache.size = 0.5
>
>
>
> Machines have 32G onboard and we used to run w/ 24G heaps but reduced them
> to lower GC times.
>
>
>
> Not so sure about which regions were hot.  And I don't want to repeat and
> take down my cluster again :)
>
>
>
> What I know:
>
>
>
> 1) The request was about 4000 gets.
>
> 2) The 4000 keys are likely contiguous and therefore probably represent
> entire regions
>
> 3) Once we batched the gets (so as not to kill the cluster) the result was
> >10G of data in client. We blew the heap there :(
>
> 4) Our regions are 10G (hbase.hregion.max.filesize  = 10737418240)
>
>
>
> Distributing these key via salting is not in our best interest as we
> typically do these types of timeseries queries (though only recently at
> this scale).
>
>
>
> I think I understand the failure mode, I guess I am just surprised that a
> greedy client can kill the cluster and that we are required to batch our
> gets in order to protect the cluster.
>
>
>
> *From:* Nick Dimiduk [mailto:ndimiduk@gmail.com]
> *Sent:* Wednesday, February 25, 2015 9:40 AM
> *To:* hbase-user
> *Cc:* Ted Yu; Development
> *Subject:* Re: Table.get(List<Get>) overwhelms several RSs
>
>
>
> How large is your region server heap? What's your setting
> for hfile.block.cache.size? Can you identify which region is being burned
> up (i.e., is it META?)
>
>
>
> It is possible for a hot region to act as a "death pill" that roams around
> the cluster. We see this with the meta region with poorly-behaved clients.
>
>
>
> -n
>
>
>
> On Wed, Feb 25, 2015 at 8:38 AM, Ted Tuttle <ted@mentacapital.com> wrote:
>
> Hard to say how balanced the table is.
>
> We have a mixed requirement where we want some locality for timeseries
> queries against "clusters" of information.  However the "clusters" in a
> table are should be well distributed if the dataset is large enough.
>
> The query in question killed 5 RSs so I am inferring either:
>
> 1) the table was spread across these 5 RSs
> 2) the query moved around on the cluster as RSs failed
>
> Perhaps you could tell me if #2 is possible.
>
> We are running v0.94.9
>
> From: Ted Yu [mailto:yuzhihong@gmail.com]
> Sent: Wednesday, February 25, 2015 7:24 AM
> To: user@hbase.apache.org
> Cc: Development
> Subject: Re: Table.get(List<Get>) overwhelms several RSs
>
> Was the underlying table balanced (meaning its regions spread evenly
> across region servers) ?
>
> What release of HBase are you using ?
>
> Cheers
>
> On Wed, Feb 25, 2015 at 7:08 AM, Ted Tuttle <ted@mentacapital.com<mailto:
> ted@mentacapital.com>> wrote:
> Hello-
>
> In the last week we had multiple times where we lost 5 of 8 RSs in the
> space of a few minutes because of slow GCs.
>
> We traced this back to a client calling Table.get(List<Get> gets) with a
> collection containing ~4000 individual gets.
>
> We've worked around this by limiting the number of Gets we send in a
> single call to Table.get(List<Get>)
>
> Is there some configuration parameter that we are missing here?
> Thanks,
> Ted
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message