hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Dimiduk (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11682) Explain hotspotting
Date Wed, 06 Aug 2014 03:52:12 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087207#comment-14087207
] 

Nick Dimiduk commented on HBASE-11682:
--------------------------------------

bq. HBase also attempts to store rows near each other in the same region, on the same region
server.

This sentence doesn't help much. A region is a contiguous sequence of rows that are physically
hosted as a unit. Rows on region boundaries are lexicographically near each other but are
part of different regions, so there are no guarantees about them being hosted on the same
region server.

bq. However, poorly designed row keys can lead to <firstterm>hotspotting</firstterm>.

This is where schema/rowkey design and access patterns go hand-in-hand.

bq. Hotspotting occurs when nearly all the rows being written to HBase are written to the
same region, because their row keys are contiguous or very similar.

I'd say "Hotspotting occurs when too much client traffic is directed at a single region. This
can be from reads, writes, or both. The traffic overwhelms the single machine responsible
for hosting that region, causing performance degradation and potentially leading to region
unavailability. This can also have adverse effects on other regions hosted by the same region
server as that host is unable to service the requested load."

bq. but in the bigger picture, data is being written to multiple regions across the cluster
...

Again, not limited to writes.

bq. One technique is to salt the row keys

Is the term "salt" explained?

bq. However, using totally random row keys would remove any benefit of HBase's row-sorting
algorithm and cause very poor performance, as each get or scan would need to query all regions.

You're assuming a sequential access pattern here. Random rowkeys can be okay for random read
access patterns, in that load is spread all over the cluster. I've seen other issues around
poor blockcache performance from completely random access patterns, but that's a slight tangent.

> Explain hotspotting
> -------------------
>
>                 Key: HBASE-11682
>                 URL: https://issues.apache.org/jira/browse/HBASE-11682
>             Project: HBase
>          Issue Type: Task
>          Components: documentation
>            Reporter: Misty Stanley-Jones
>            Assignee: Misty Stanley-Jones
>         Attachments: HBASE-11682.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message