hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Meil <doug.m...@explorysmedical.com>
Subject Re: Designing Row Key
Date Wed, 07 Mar 2012 14:17:12 GMT

Hi there-

You probably also want to see this section in the RefGuide on schema
design...

http://hbase.apache.org/book.html#rowkey.design

... as well as this for region-RS assignment (and failover)...


http://hbase.apache.org/book.html#regions.arch

re:  "recommended minimum number of nodes?"


The RefGuide also comments on this as well...
http://hbase.apache.org/book.html#arch.overview


Good luck!


On 3/7/12 5:23 AM, "Phil Evans" <philhelm.evans@googlemail.com> wrote:

>Dear All,
>
>We¹re currently designing a Row Key for our schema and this has raised a
>number of queries which we¹ve struggled to find a definitive answer to but
>think we understand what goes on and hoped someone on the list would be
>able to help clarify!
>
>Ultimately, the data we are storing is time series data and we understand
>the issues that can arise from having the reverse order timestamp in the
>left most part of the key. However, from what I¹ve read the solution used
>by the OpenTSDB project for prefixing the reverse order date with some
>sort
>of salted value (the metric type) would work well for us, too.
>
>   - Due to the shape of the data we are storing, it is quite likely that
>a
>   handful of those salted values (perhaps 3 or 4 of them) will have
>   significantly more rows stored against them than the others. Could this
>   result in a particular node getting full? From the impression I¹ve got
>from
>   *HBase - The Definitive Guide* it appears that it¹s possible for
>regions
>   to get moved between nodes. Is that correct and does this happen
>   automatically? Is it possible for one of those metric types/salted
>values
>   to be stored over a number of different regions to stop a particular
>node
>   from being nailed?
>   - Secondly, from a data recovery point of view, our assumption is,
>   should a node fail we¹re covered because the data is partially
>replicated
>   to multiple nodes (by HDFS) and therefore the regions previously
>served by
>   the failed node can be reconstructed and made available via a different
>   node. Is that a correct assumption? For development purposes we are
>   currently running with three nodes. Is that sufficient? Is there a
>   recommended minimum number of nodes?
>
>Thanks for taking the time to read my email and apologise if some of these
>questions are a bit basic!
>
>Looking forward to your response,
>Cheers,
>
>Phil.



Mime
View raw message