hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Han <dannahan2...@gmail.com>
Subject Re: Distribution of regions to servers
Date Wed, 26 Sep 2012 15:39:31 GMT
  Thanks for your swift response, Ramkrishna and Anoop. And I will
explicate what we are doing now below.

   We are trying to explore a systematic way to design the appropriate data
schema for various applications in HBase. So we first designed several data
schemas for each dataset and evaluate them with the same queries.  The
queries are designed based on the requirements, such as selecting the data
with a matching expression, finding the difference between two
snapshots. The queries were processed with user-level Coprocessor.

   In our experiments, we found that under some data schemas, the queries
cannot get any results because of the connection timeout and RS crash
sometimes. We observed that in this case, the queried data were centered in
a few regions locating in a few region servers. We think the failure might
be caused by the excess workload in these few region servers and the
inappropriate load balance. To our best knowledge, this case can be avoided
and improved by the well-distributed regions across the region servers.

  Therefore, we have been thinking to add a monitoring and management
component between the client and server, which can schedule the
queries/jobs from client side and distribute the regions dynamically
according to the current workload of each region server, the incoming
queries and data locality.

  Does it make sense? Just my two cents. Any comments?

Best Wishes
Dan Han

On Tue, Sep 25, 2012 at 10:44 PM, Anoop Sam John <anoopsj@huawei.com> wrote:

> Hi
> Can u share more details pls? What work you are doing within the CPs
> -Anoop-
> ________________________________________
> From: Dan Han [dannahan2008@gmail.com]
> Sent: Wednesday, September 26, 2012 5:55 AM
> To: user@hbase.apache.org
> Subject: Distribution of regions to servers
> Hi all,
>    I am doing some experiments on HBase with Coprocessor. I found that the
> performance
> of Coprocessor is impacted much by the distribution of the regions. I am
> kind of interested in
> going deep into this problem and see if I can do something.
>   I only searched out the discussion in the following link.
> http://search-hadoop.com/m/Vjhgj1lqw7Y1/hbase+distribution+region&subj=distribution+of+regions+to+servers
> I am wondering if there is any further discussion or any on-going work? Can
> someone point it to me if there is?
> Thanks in advance.
> Best Wishes
> Dan Han

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message