hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gaurav Sharma <gaurav.gs.sha...@gmail.com>
Subject Re: Guidelines for production cluster
Date Thu, 29 Nov 2012 00:10:55 GMT
So, before getting any suggestions, will have to explain a few core things:

1. do you know if there exist patterns in this data?
2. will the data be read and how?
3. does there exist a hot subset of the data - both read/write?
4. what makes you think hdfs is a good option?
5. how much do you intend to pay per TB?
6. say you do build the system, how do you plan to keep lights on?
7. forgot to ask - is the data textual or binary?

Those are just the basic questions. Are you going to be building and running the system all
by yourself?

On Nov 28, 2012, at 14:09, Mohammad Tariq <dontariq@gmail.com> wrote:

> Hello list,
>      Although a lot of similar discussions have been done here, I still seek some of
your able guidance. Till now I have worked only on small or mid-sized clusters. But this time
situation is a bit different. I have to cpollect a lot of legacy data, stored over last few
decades. This data is on tape drives and I have to collect it from there and store in my cluster.
The size could go somewhere near 24 Petabytes (inclusive of replication).
> Now, I need some help to kick this off, like what could be the optimal config for my
NN+JT, DN+TT+RS,  HMaster, ZK machines? 
> What should be the no. of slaves and ZK peers nodes keeping this config in mind?
> What is the optimal network config for a cluster of this size.
> Which kind of disks would be more efficient?
> Please do provide me some guidance as I want to have some expert comments before moving
ahead. Many thanks.
> Regards,
>     Mohammad Tariq

View raw message