hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ninad Raut <hbase.user.ni...@gmail.com>
Subject Re: Keeping Compute Nodes seperate from the region server node-- pros and cons
Date Fri, 15 May 2009 06:30:05 GMT
Hi Andy,
Thanks for the tip.
I have a EC2 cluster with 6 nodes. Each a server grade large instance. I
have the mapred & regionservers running on all the nodes. Our deployment
will not go beyond 20 clusters in the near future. What would you suggest me
to have? Scenario 1 or 2 as u mentioned ?

On Thu, May 14, 2009 at 10:44 PM, Andrew Purtell <apurtell@apache.org>wrote:

> Hi Ninad,
> I think the answer depends on the anticipated scale of the deployment.
> For small clusters (up to a few racks, ~40 servers per rack) I don't think
> there is any significant performance hit to separate storage and
> computation. Presumably all servers will share the same large GigE switch --
> or maybe a redundant L2 pair via bonded interfaces for fail over -- or a few
> of them stacked with high speed interconnects. This would relieve the
> storage nodes of RAM and CPU burden related to the computational tasks as
> you are thinking, providing more headroom in exchange for some quite modest
> performance penalty. (However, if your computation load is high and
> therefore the nodes are overburdened and are not stable, there is no
> alternative...) In the future this consideration might change if DFS clients
> are given some capability to find blocks on local disk via some optimized
> I/O path.
> In a large cluster there might well be significant performance impact. In a
> common deployment scenario, there are rack-local switched fabrics and
> another switched fabric for uplinks from the racks. So, a rack would have a
> switched GigE backplane or similar, but inter-rack connections might be
> single GigE uplinks, a ~40-to-1 reduction in capacity worst case; or maybe
> 10 GigE uplinks, a ~10-1 reduction. Therefore it would be desirable to
> distribute the computation into the racks where the data is located. When a
> region is deployed to a region server the underlying blocks on DFS are not
> immediately migrated, but always after a compaction -- a rewrite -- the
> underlying blocks will be available on rack local data nodes, according to
> my understanding of how DFS places replicas upon write. So, after a split,
> daughter regions will have their blocks appropriately located in a timely
> manner. For the rest I wonder if it would be beneficial to consider
> scheduling
>  major compaction more frequently than the 24 hour default for datacenter
> scale deployments, something like 8 hours, and you might also consider
> triggering a major compaction on important tables after cluster (re)init.
> Region deployment in a system in steady state should have relatively little
> churn so this will have the effect of optimizing block placement for region
> store access.
> Submitted for your consideration,
>    - Andy
> ________________________________
> From: Ninad Raut <hbase.user.ninad@gmail.com>
> To: hbase-user <hbase-user@hadoop.apache.org>
> Cc: Ranjit Nair <ranjit.nair@germinait.com>
> Sent: Thursday, May 14, 2009 2:56:04 AM
> Subject: Keeping Compute Nodes seperate from the region server node-- pros
> and  cons
> Hi,
> I want to get a design perspective here as to what will be the advantages
> of
> seperating region servers and compute node(to run mapreduce tasks)
> Will seperating datanodes from computes node reduce the load on the servers
> and avoid swapping problems?
> Will this seperation make map reduce tasks less efficient , since we are
> doing away with localization issues?
> Regards,
> Ninad

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message