hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ramkrishna.S.Vasudevan" <ramkrishna.vasude...@huawei.com>
Subject RE: Designing Row Key
Date Wed, 07 Mar 2012 10:56:34 GMT
Comments inline.

> -----Original Message-----
> From: philip.evans@gmail.com [mailto:philip.evans@gmail.com] On Behalf
> Of Phil Evans
> Sent: Wednesday, March 07, 2012 3:53 PM
> To: user@hbase.apache.org
> Subject: Designing Row Key
> Dear All,
> We're currently designing a Row Key for our schema and this has raised
> a
> number of queries which we've struggled to find a definitive answer to
> but
> think we understand what goes on and hoped someone on the list would be
> able to help clarify!
> Ultimately, the data we are storing is time series data and we
> understand
> the issues that can arise from having the reverse order timestamp in
> the
> left most part of the key. However, from what I've read the solution
> used
> by the OpenTSDB project for prefixing the reverse order date with some
> sort
> of salted value (the metric type) would work well for us, too.
>    - Due to the shape of the data we are storing, it is quite likely
> that a
>    handful of those salted values (perhaps 3 or 4 of them) will have
>    significantly more rows stored against them than the others. Could
> this
>    result in a particular node getting full? From the impression I've
> got from
>    *HBase - The Definitive Guide* it appears that it's possible for
> regions
>    to get moved between nodes. Is that correct and does this happen
>    automatically? Is it possible for one of those metric types/salted
> values
>    to be stored over a number of different regions to stop a particular
> node
>    from being nailed?
[Ram] Yes.  HBase does this automatically.  If you use 0.90 version of HBase
the RS will take care of balancing.  But it may not be as per the load that
goes particularly because your use case is time based.
In 0.92 we have the provision of plugging the balancer based on how we want
the regions to be distributed.
>    - Secondly, from a data recovery point of view, our assumption is,
>    should a node fail we're covered because the data is partially
> replicated
>    to multiple nodes (by HDFS) and therefore the regions previously
> served by
>    the failed node can be reconstructed and made available via a
> different
>    node. Is that a correct assumption? 
[Ram] Yes automatic failover will be done.  The data (here region) with one
RS will automatically taken over by another RS.
For development purposes we are
>    currently running with three nodes. Is that sufficient? Is there a
>    recommended minimum number of nodes?
[Ram] For development three nodes is ok.

Hope my replies are useful to you.
> Thanks for taking the time to read my email and apologise if some of
> these
> questions are a bit basic!
> Looking forward to your response,
> Cheers,
> Phil.

View raw message