hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gaurav Sharma <gaurav.gs.sha...@gmail.com>
Subject Re: Guidelines for production cluster
Date Thu, 29 Nov 2012 23:07:48 GMT
The 7th question should've been the first to rather obviate the need for
some of the other 6. So, if the data is binary, MR is of little use anyway.
Didn't understand and likely believe when you say this:
"No, entire data is equally important and will be read together."

Other than that, an 8th question:
8. how much read latency can the system tolerate?

and a 9th:
9. what is the usable size of a unit of data being read? it being binary,
does the entire stream have to be read to make sense of it for the
application are parts of the binary usable?


If you can get away with some read-latency, take a look at one of the
commercial erasure coding solutions out there (like Cleversafe) or just
code one yourself. Also, see: https://issues.apache.org/jira/browse/HDFS-503

hth


On Thu, Nov 29, 2012 at 2:19 AM, Mohammad Tariq <dontariq@gmail.com> wrote:

> Hello Gaurav,
>
>     Thank you so much for your reply. Please find my comments embedded
> below :
>
> 1. do you know if there exist patterns in this data?
> >> Yes, entire file is divided into data blocks of fixed length (But there
> is no separator between 2 blocks).
>
> 2. will the data be read and how?
> >> Yes, data has to be read. To be honest, we are still not sure how to do
> that.
>
> 3. does there exist a hot subset of the data - both read/write?
> >> No, entire data is equally important and will be read together.
>
> 4. what makes you think hdfs is a good option?
> >> Distributed architecture, Flexibility to read any kind of data,
> Parallelism, Native MR integration, Cost, Fault tolerance, High
> throughput etc.
>
> 5. how much do you intend to pay per TB?
> >> I have to discuss it with my superiors (Will let you know soon).
>
> 6. say you do build the system, how do you plan to keep lights on?
> >> I am sorry I did not get this. I mean i'll do whatever it takes to keep
> everything moving. I have some experience with small clusters. And I have
> got a small team with me which is ready 24*7.
>
> 7. forgot to ask - is the data textual or binary?
> >> Data is binary.
>
> No, I would require some help. I have a team with me as I have said. But
> being new to Hadoop I would need some help from whatever source it is.
>
> Many thanks.
>
> Regards,
>     Mohammad Tariq
>
>
>
> On Thu, Nov 29, 2012 at 5:40 AM, Gaurav Sharma <gaurav.gs.sharma@gmail.com
> > wrote:
>
>> So, before getting any suggestions, will have to explain a few core
>> things:
>>
>> 1. do you know if there exist patterns in this data?
>> 2. will the data be read and how?
>> 3. does there exist a hot subset of the data - both read/write?
>> 4. what makes you think hdfs is a good option?
>> 5. how much do you intend to pay per TB?
>> 6. say you do build the system, how do you plan to keep lights on?
>> 7. forgot to ask - is the data textual or binary?
>>
>> Those are just the basic questions. Are you going to be building and
>> running the system all by yourself?
>>
>>
>> On Nov 28, 2012, at 14:09, Mohammad Tariq <dontariq@gmail.com> wrote:
>>
>> > Hello list,
>> >
>> >      Although a lot of similar discussions have been done here, I still
>> seek some of your able guidance. Till now I have worked only on small or
>> mid-sized clusters. But this time situation is a bit different. I have to
>> cpollect a lot of legacy data, stored over last few decades. This data is
>> on tape drives and I have to collect it from there and store in my cluster.
>> The size could go somewhere near 24 Petabytes (inclusive of replication).
>> >
>> > Now, I need some help to kick this off, like what could be the optimal
>> config for my NN+JT, DN+TT+RS,  HMaster, ZK machines?
>> >
>> > What should be the no. of slaves and ZK peers nodes keeping this config
>> in mind?
>> >
>> > What is the optimal network config for a cluster of this size.
>> >
>> > Which kind of disks would be more efficient?
>> >
>> > Please do provide me some guidance as I want to have some expert
>> comments before moving ahead. Many thanks.
>> >
>> > Regards,
>> >     Mohammad Tariq
>> >
>>
>
>

Mime
View raw message