hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Advice sought for mixed hardware installation
Date Thu, 14 Oct 2010 16:57:35 GMT
That's a lot of information to digest Tim, so bear with me if I miss
on some details :)

a) isn't really good, the big nodes have a lot of computational power
AND spindles so leaving them like that is a waste, is there's 0
locality (MR to HBase, HBase to HDFS)

b) sounds weird, would need more time to think about it

and let me propose

c) 10 nodes with HDFS and HBase, the big nodes with HDFS and MR.

 - My main concern in this setup is giving HBase some processing power
and lots of RAM. In this case you can give 6GB to the RSs, 1GB to the
DN, and 1GB for the OS (caching, etc).
 - On the 3 nodes, setup MR so that it uses as many tasks as those
machines can support (do they have hyper-threading? if so you can even
use more than 8 tasks). At the same time, the tasks can enjoy a full
1GB of heap each.
 - On locality, HBase will be collocated with DNs so this is great in
many ways, better than collocating HBase with MR since it's not always
useful (like on a batch import job, the tasks may use different
regions at the same time and you cannot predict that... so they still
go on the network).
 - On other thing on locality, MR tasks do write intermediate data on
HDFS so having them collocated with DNs will help.

Regarding the master/NN/ZK, since it's a very small cluster I would
use one of the small node to collocate the 3 of them (this means you
will only have 9 RS). You don't really need an ensemble, unless you're
planning to share that ZK setup with other apps.

In any case, you should test all setups.

J-D

On Thu, Oct 14, 2010 at 4:51 AM, Tim Robertson
<timrobertson100@gmail.com> wrote:
> Hi all,
>
> We are about to setup a new installation using the following machines,
> and CDH3 beta 3:
>
> - 10 nodes of single quad core, 8GB memory, 2x500GB SATA
> - 3 nodes of dual quad core, 24GB memory, 6x250GB SATA
>
> We are finding our feet, and will blog tests, metrics etc as we go but
> our initial usage patterns will be:
>
> - initial load of 250 million records to HBase
> - data harvesters pushing 300-600 records per second of insert or
> update (under 1KB per record) to TABLE_1 in HBase
> - MR job processing changed content in TABLE_1 into TABLE_2 on an
> (e.g.) 6 hourly cron job (potentially using co-processors in the
> future)
> - MR job processing changed content in TABLE_2 into TABLE_3 on an
> (e.g.) 6 hourly cron job (potentially using co-processors in the
> future)
> - MR jobs building Lucene, SOLR, PostGIS (hive+sqoop) indexes  on a
> 6,12 or 24 hourly cron job either by
>  a) bulk export from HBase to .txt and then Hive or custom MR processing
>  b) hive or custom MR processing straight from HBase tables as the input format
> - MR jobs building analytical counts (e.g. 4 way "group bys" in SQL
> using Hive) on 6,12,4 hourly cron either by
>  a) bulk export from HBase to .txt and then Hive / custom MR processing
>  b) hive, MR processing straight from HBase tables
>
> To give an idea, at the moment on the 10 node cluster Hive against
> .txt files does full scan in 3-4 minutes (our live system is Mysql and
> we export to .txt for Hive)
>
> I see we have 2 options, but I am inexperienced and seek any guidance:
>
> a) run HDFS across all 13 nodes, MR on the 10 small nodes, region
> servers on the 3 big nodes
>  - MR will never benefit from data locality when using HBase (? I think)
> b) run 2 completely separate clusters
>  clu1: 10 nodes, HDFS, MR
>  clu2: 3 nodes, HDFS, MR, RegionServer
>
> With option b) we would do 6 hourly exports from clu2 -> clu1 and
> really keep the processing load off the HBase cluster
>
> We are prepared to run both, benchmark and provide metrics, but I
> wonder if someone has some advice beforehand.
>
> We are anticipating:
> - NN, 2nd NN, JT on 3 of the 10 smaller nodes
> - HBase master on 1 of the 3 big nodes
> - 1 ZK daemon on 1 of the 3 big nodes (or should we go for an assemble
> of 3, with one on each)
>
> Thanks for any help anyone can provide,
>
> Tim
> (- and Lars F.)
>

Mime
View raw message