hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohammad Tariq <donta...@gmail.com>
Subject Re: Guidelines for production cluster
Date Thu, 29 Nov 2012 10:19:48 GMT
Hello Gaurav,

    Thank you so much for your reply. Please find my comments embedded
below :

1. do you know if there exist patterns in this data?
>> Yes, entire file is divided into data blocks of fixed length (But there
is no separator between 2 blocks).

2. will the data be read and how?
>> Yes, data has to be read. To be honest, we are still not sure how to do
that.

3. does there exist a hot subset of the data - both read/write?
>> No, entire data is equally important and will be read together.

4. what makes you think hdfs is a good option?
>> Distributed architecture, Flexibility to read any kind of data,
Parallelism, Native MR integration, Cost, Fault tolerance, High
throughput etc.

5. how much do you intend to pay per TB?
>> I have to discuss it with my superiors (Will let you know soon).

6. say you do build the system, how do you plan to keep lights on?
>> I am sorry I did not get this. I mean i'll do whatever it takes to keep
everything moving. I have some experience with small clusters. And I have
got a small team with me which is ready 24*7.

7. forgot to ask - is the data textual or binary?
>> Data is binary.

No, I would require some help. I have a team with me as I have said. But
being new to Hadoop I would need some help from whatever source it is.

Many thanks.

Regards,
    Mohammad Tariq



On Thu, Nov 29, 2012 at 5:40 AM, Gaurav Sharma
<gaurav.gs.sharma@gmail.com>wrote:

> So, before getting any suggestions, will have to explain a few core things:
>
> 1. do you know if there exist patterns in this data?
> 2. will the data be read and how?
> 3. does there exist a hot subset of the data - both read/write?
> 4. what makes you think hdfs is a good option?
> 5. how much do you intend to pay per TB?
> 6. say you do build the system, how do you plan to keep lights on?
> 7. forgot to ask - is the data textual or binary?
>
> Those are just the basic questions. Are you going to be building and
> running the system all by yourself?
>
>
> On Nov 28, 2012, at 14:09, Mohammad Tariq <dontariq@gmail.com> wrote:
>
> > Hello list,
> >
> >      Although a lot of similar discussions have been done here, I still
> seek some of your able guidance. Till now I have worked only on small or
> mid-sized clusters. But this time situation is a bit different. I have to
> cpollect a lot of legacy data, stored over last few decades. This data is
> on tape drives and I have to collect it from there and store in my cluster.
> The size could go somewhere near 24 Petabytes (inclusive of replication).
> >
> > Now, I need some help to kick this off, like what could be the optimal
> config for my NN+JT, DN+TT+RS,  HMaster, ZK machines?
> >
> > What should be the no. of slaves and ZK peers nodes keeping this config
> in mind?
> >
> > What is the optimal network config for a cluster of this size.
> >
> > Which kind of disks would be more efficient?
> >
> > Please do provide me some guidance as I want to have some expert
> comments before moving ahead. Many thanks.
> >
> > Regards,
> >     Mohammad Tariq
> >
>

Mime
View raw message