hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jingguo yao <yaojing...@gmail.com>
Subject Use single cluster or two clusters for log analysis and HBase?
Date Tue, 29 Nov 2011 07:37:06 GMT
I want to set up Hadoop clusters. There are two workloads. One is log
analysis which is using MapReduce to process big log files in HDFS.
The other is HBase which is used to serve random table queries.

I have two choices to set up my Hadoop clusters. One is to use one
Hadoop cluster. Log analysis and HBase use the same cluster. Its
advantages are:

1 There is only one Hadoop cluster which I need to manage.
2 Both MapReduce and HBase can use this big cluster which has more
  storage and more powerful computation capability.

Its disadvantages:

1 Running MapReduce jobs may slow down the random HBase table

The other choice is to use two clusters. Cluster A is for log analysis.
Cluster B is for HBase. Its advantages are:

1 There are no interferences between log analysis and HBase table

Its disadvantages:

1. There are two Hadoop clusters which need to be managed.
2. Both log analysis and HBase queries can only use a small Hadoop
   cluster which has less storage and less powerful computation

I don't know which choice is better. Can anybody give me some advice
on this? Thanks.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message