hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bejoy Ks <>
Subject Re: Difference between Apache Hadoop and Cloudera Hadoop
Date Thu, 03 Nov 2011 16:30:54 GMT
Hi Ravindra
    Adding cdh-user goups in cc.  You are also copied as you may not have registered with
cdh groups.

        I'm sharing you my experience with CDH.(it is jpurely a personal recommendation)
CDH source code is basically from the apache svn itself,but not mirrored to apache releases.
A CDH release would be corresponding to a certain/latest release from apache with a good number
of patches on top. Majority of these patches would be available in hadoop svn but may be not
part of the current Apache Hadoop release. 

The major advantages I saw with CDH are
- Cloudera provides a tool SCM that would kind of automatically set up a hadoop cluster for
- Cloudera bundles the hadoop related projects which is pretty ease to install on any standard
linux boxes()
- Cloudera ensures that the CDH release and the available hadoop projects for the release
are compatible(for example you don't have to take the hassle on   finding the compatible
hbase release with your hadoop release and integration between related projects etc) 
- There are a good number of large enterprises using CDH with cloudera support.(Cloudera provides
various support packages)
- Since a large enterprises are dependent on CDH, it in turn speaks how well CDH is tested
and if a bug arises how large would be the impact. (In short CDH is well tested)
- Under Cloudera support you get help and suggestions from Cloudera hadoop expert engineers
in fine tuning your hadoop platform, tools application etc.
- When you go in with some end to end enterprise solutions with hadoop, you can even get advises
on best practices in your code level as well from them.(You do get the same from hadoop user
groups as well but here there is a dedicated timeline based commitment when you are a customer
of Cloudera)
- If you don't have the best hadoop resources in store, you may find tough times in handling 
failures on your cluster , fine tuning your cluster, updating your cluster, optimizing your
applications etc. Cloudera guys would throw light almost all critical issues and helps in
getting resolved under stringent SLAs.

These points never says Apache Releases not so great. It is definitely the best and back bone
of hadoop. It is well tested as well. But when it comes nonavailability of expert hadoop resources
in house, you can face lot of unexpected hurdles which you may need to handle in time bound
manner and there you need to have hadoop consultants.

Definitely you'd get  more valid points directly from the Cloudera engineers.(Some official

Hope it helps!..

Thank You.



From: "Agarwal, Ravindra (ASG)" <>
Sent: Thursday, November 3, 2011 8:57 PM
Subject: Difference between Apache Hadoop and Cloudera Hadoop

(a)    I would
like to know what are the key differences between Hadoop distributed by Apache
and Hadoop distributed by Cloudera (i.e. CDH).
(b)   When should
one go for Apache’s distribution of Hadoop and when for Cloudera (CDH)?
Confidential: This electronic message and all contents contain information from Syntel, Inc.
which may be privileged, confidential or otherwise protected from disclosure. The information
is intended to be for the addressee only. If you are not the addressee, any disclosure, copy,
distribution or use of the contents of this message is prohibited. If you have received this
electronic message in error, please notify the sender immediately and destroy the original
message and all copies.
View raw message