hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohammad Tariq <donta...@gmail.com>
Subject Guidelines for production cluster
Date Wed, 28 Nov 2012 22:09:19 GMT
Hello list,

     Although a lot of similar discussions have been done here, I still
seek some of your able guidance. Till now I have worked only on small or
mid-sized clusters. But this time situation is a bit different. I have to
cpollect a lot of legacy data, stored over last few decades. This data is
on tape drives and I have to collect it from there and store in my cluster.
The size could go somewhere near 24 Petabytes (inclusive of replication).

Now, I need some help to kick this off, like what could be the optimal
config for my NN+JT, DN+TT+RS,  HMaster, ZK machines?

What should be the no. of slaves and ZK peers nodes keeping this config in
mind?

What is the optimal network config for a cluster of this size.

Which kind of disks would be more efficient?

Please do provide me some guidance as I want to have some expert comments
before moving ahead. Many thanks.

Regards,
    Mohammad Tariq

Mime
View raw message