hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Atif Khan <atif_ijaz_k...@hotmail.com>
Subject Re: Shared HDFS for HBase and MapReduce
Date Wed, 06 Jun 2012 18:15:41 GMT
Thanks to all who replied, especially Vladimir and Mathias!!!

So if I understand this correctly, there is physical resource contention
problem given that both MR and HBase are resource hungry.  Therefore, when
end-user SLAs are in place, performance guarantees may be compromised when
HBase and MR share the same HDFS cluster (and other resources).

According to Mathias's suggestion, on production HDFS cluster, we could
throttle/limit the MR activity so that it has minimal impact on HBase's
(realtime) performance.

So far so good.

Now my BIG question is about the BIG Data itself (no pun intended).  If I do
create two HDFS clusters (one for MR and one for HBase), and then given that
HBase acting as data source and sink; Would I not be forced to move LARGE
amounts of data between the two HDFS clusters?  Given the size of the data,
this could potentially congest the internal network on which the two
independent HDFS clusters are deployed.


View this message in context: http://apache-hbase.679495.n3.nabble.com/Shared-HDFS-for-HBase-and-MapReduce-tp4018856p4018878.html
Sent from the HBase - Developer mailing list archive at Nabble.com.

View raw message