Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8BBB410107 for ; Mon, 16 Sep 2013 22:11:42 +0000 (UTC) Received: (qmail 18561 invoked by uid 500); 16 Sep 2013 22:11:30 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 18431 invoked by uid 500); 16 Sep 2013 22:11:29 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 18398 invoked by uid 99); 16 Sep 2013 22:11:24 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Sep 2013 22:11:24 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of secsubs@gmail.com designates 209.85.220.170 as permitted sender) Received: from [209.85.220.170] (HELO mail-vc0-f170.google.com) (209.85.220.170) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Sep 2013 22:11:18 +0000 Received: by mail-vc0-f170.google.com with SMTP id kw10so3558996vcb.1 for ; Mon, 16 Sep 2013 15:10:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=CVypqirXv+CKVCxdd3ZIXygk0OfWKhSiAbkO8O4AgkI=; b=sBxwL3k+oDC1wnmNcPPaI9PN6mnjnnMuLDtQda0+7oeCWr3VkxDJQ6v6JYza3PDp7j zHL8tiA2N3svH2FF+n8Agk7uPa5y9SjAXjq/6UUAucYFUBKMJdHAgLKcws+XCH4KkEu8 wBdLyBaSgzwkevV58TyFsjM2cuONS3zqU0LNB/fGwqb5bcPP+3/BhOm7zlrAW6gqIVAX /V7Bz5EH4SWBARMXnbbRqmiXxOXDwB+KVI3uHts3iM+ppqTeZl4zypoary/S2VXFdXyn fW9S0mPc/gZn4rpOgEXBlnM1Es+GcIVWsZm81OBPlSjK+ADLWIXQ0xcLxhJ1VSVnDvlS hQ3Q== MIME-Version: 1.0 X-Received: by 10.52.94.37 with SMTP id cz5mr2363276vdb.30.1379369457037; Mon, 16 Sep 2013 15:10:57 -0700 (PDT) Received: by 10.221.57.129 with HTTP; Mon, 16 Sep 2013 15:10:56 -0700 (PDT) In-Reply-To: References: Date: Mon, 16 Sep 2013 15:10:56 -0700 Message-ID: Subject: Re: Cloudera Vs Hortonworks Vs MapR From: Xuri Nagarin To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=20cf307f31708038d604e6877a2d X-Virus-Checked: Checked by ClamAV on apache.org --20cf307f31708038d604e6877a2d Content-Type: text/plain; charset=ISO-8859-1 So I will try to answer the OP's question best I can without deviating too much into opinions and stick to facts. Disclaimer: I am not an employee of either vendor or any partner of theirs. Context is important: My team's use case was general data exploration of semi-structured log data and we had no typical data-warehouse type of existing use cases. Also, our's is a small (less than 30 nodes cluster). In terms of ops/maintenance, we only have one person. I point this out because lots of hadoop shops have dedicated team for each - OS administration, Hadoop admin, Hadoop developers. And, they are very mature in terms of their compute use cases. To my mind, these aspects can significantly impact your vendor choices. MapR: My team simply did not consider them because of all the proprietary code in there. We are trying to move from a monolithic proprietary product and one of the criteria we set was - if we decided to move away from the chosen hadoop vendor, can we easily unlock our data? HortonWorks: Distro uses HDFS 1.x with MRv2. All open source. Cluster management is via Ambari. Compared to Cloudera's CM, Ambari has very rudimentary features. But you have to keep in mind that Ambari is only an year old where as CM already has been under development for several years. This was a major selection factor for us because Ambari did not have all the automation/feature-set compared to CM for a single administrator/developer to easily maintain the cluster. Also, during the trial period, Hortonwork's packing format/structure apparently kept changing which made things a bit difficult to centrally deploy/administer. Cloudera: Distro uses HDFS 2.x with MRv1. All open source except cluster management which is via their proprietary Cloudera Manager tool. It is free for use without certain feature like auditing and cluster replication features. Maybe a few more features are restricted to Enterprise/Licensed-only version. Offers much more features than Ambari. In terms of cluster administration, I found CM much easy to work with than Ambari. Pretty much all aspects from deploying new nodes to configuration and troubleshooting is much more refined than Ambari. During the selection process, what I found was that both vendors are very aggressive in their pitch. So much so that each pushes some FUD regarding the competition. HW uses HDFS 1.x + MRv2 while CDH uses HDFS 2.x + MRv1. HW claimed that Cloudera's distro is heavily patched off-course from the core Apache trunk that can cause severe data corruption issues. Yes, Cloudera has some 1500+ patches over apache's Hadoop distro but (1) they aren't private patches. You can pull the list and verify that yourself just as I did. (2) In our testing and talking to other Cloudera customers, I couldn't find any issues with data corruption. It is true though that HDFS 2.x is still in beta but so is MRv2 that HW uses. I think both are stable and work well - depending on what you need but each uses that point to create FUD. HW also claimed that a new SQL engine that Cloudera's including in their distro - Impala is proprietary. Not true. The software is open source. But if you want support for Impala then Cloudera will charge you separately per node for Impala over and above what they charge per node for Hadoop support. In my experience, both products have plenty of issues when it comes to compute engines - Hive, Pig etc and their cluster management software. HDFS seem to be solid in both distros. So I wouldn't call either of them trouble-free and neither is at the maturity level of other popular enterprise products like say, Oracle. That said, you have to keep in mind that both vendors/products are successfully used by several customers so again, it is more a question of what fits your needs. In the end, we chose to go with Cloudera mostly because a more positive experience with CM in terms of administration/operations and their pre-sales team when compared to HW. Again, that said, another team that we closely work with chose HW for their cluster. I use both vendors/clusters at work and neither has any significant issues. On Sat, Sep 14, 2013 at 12:37 PM, Chris Mattmann wrote: > Here's the deal, folks can post questions to the list that aren't > abusive and simply asking what the difference between different vendor > implementations (downstream) of Apache Hadoop is not an inflammatory > or abusive question. > > Stick to the facts. Discuss it here. Why should the Apache Hadoop > PMC push off potentially useful questions that may have upstream > implications to the Apache Hadoop core and let all the innovation > occur downstream? > > Have the conversations here if you'd like. I wouldn't turn anyone > away.. > > My 2c. > > Cheers, > Chris > > ----Original Message----- > > From: Shahab Yunus > Reply-To: "user@hadoop.apache.org" > Date: Friday, September 13, 2013 10:48 AM > To: "user@hadoop.apache.org" > Subject: Re: Cloudera Vs Hortonworks Vs MapR > > >I think, in my opinion, it is a wrong idea because: > > > > > >1- Many of the participants here are employees for these very companies > >that are under discussion. This puts these respective employees in very > >difficult position. It is very hard to come with a correct response. > >Comments can be misconstrued easily. > >2- Also, when we talk about vendor distributions of the software, it is > >not longer purely about open source. Now companies with the related > >corporate legal baggage also gets in the mix. > >3- The discussion would be on not only positive things about each vendor > >but in fact negatives. The latter type of discussion which can get > >unpleasant very easily. > > > >4- Somebody mentioned that, this is a very lightly moderated platform and > >thus this discussion should be allowed. I think this is one of the > >reasons that it should not be because, people can say things casually, > >without much thought, or without taking > > care of the context or the possible interpretations and get in trouble. > >5- The risk here is not only that serious repercussions can occur (which > >very well can) but the greater risk is that it can cause misunderstanding > >between individuals, industries and companies. > >6-People here lot of time reply quickly just to resolve or help the > >'technical' issue. Now they will have to take care how they frame the > >response. Re: 4 > > > > > >I know some will feel that I have created a highly exaggerated scenario > >above, but what I am trying to say is that, it is a slippery slope. If we > >allow this then this can go anywhere. > > > > > >By the way, I do not work for any of these vendors. > > > > > >More importantly, I am not saying that this discussion should not be had, > >I am just saying that this is a wrong forum. > > > > > >Just my 2 cents (or,...this was rather a dollar.) > > > > > >Regards, > >Shahab > > > > > > > > > >On Fri, Sep 13, 2013 at 1:50 AM, Chris Mattmann > > wrote: > > > >Errr, what's wrong with discussing these types of issues on list? > > > >Nothing public here, and as long as it's kept to facts, this should > >not be a problem and Apache is a fine place to have such discussions. > > > >My 2c. > > > > > > > > > > > >-----Original Message----- > >From: Xuri Nagarin > >Reply-To: "user@hadoop.apache.org" > >Date: Thursday, September 12, 2013 4:39 PM > >To: "user@hadoop.apache.org" > >Subject: Re: Cloudera Vs Hortonworks Vs MapR > > > >>I understand it can be contentious issue especially given that a lot of > >>contributors to this list work for one or the other vendor or have some > >>stake in any kind of evaluation. But, I see no reason why users should > >>not be able to compare notes > >> and share experiences. Over time, genuine pain points or issues or > >>claims will bubble up and should only help the community. Sure, there > >>will be a few flame wars but this already isn't a very tightly moderated > >>list. > >> > >> > >> > >> > >> > >> > >> > >>On Thu, Sep 12, 2013 at 11:14 AM, Aaron Eng > >> wrote: > >> > >>Raj, > >> > >> > >>As others noted, this is not a great place for this discussion. I'd > >>suggest contacting the vendors you are interested in as I'm sure we'd all > >>be happy to provide you more details. > >> > >> > >>I don't know about the others, but for MapR, just send an email to > >>sales@mapr.com and I'm sure someone will get > back > >>to you with more information. > >> > >> > >>Best Regards, > >>Aaron Eng > >> > >> > >> > >>On Thu, Sep 12, 2013 at 10:19 AM, Hadoop Raj > wrote: > >> > >> > >>Hi, > >> > >>We are trying to evaluate different implementations of Hadoop for our big > >>data enterprise project. > >> > >>Can the forum members advise on what are the advantages and disadvantages > >>of each implementation i.e. Cloudera Vs Hortonworks Vs MapR. > >> > >>Thanks in advance. > >> > >>Regards, > >>Raj > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > > > > > > > > > > > > > > > > > --20cf307f31708038d604e6877a2d Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
So I will try to answer the OP's question best I can w= ithout deviating too much into opinions and stick to facts. Disclaimer: I a= m not an employee of either vendor or any partner of theirs.

Context is important: My team's use case was general data explorat= ion of semi-structured log data and we had no typical data-warehouse type o= f existing use cases. Also, our's is a small (less than 30 nodes cluste= r). In terms of ops/maintenance, we only have one person. I point this out = because lots of hadoop shops have dedicated team for each - OS administrati= on, Hadoop admin, Hadoop developers. And, they are very mature in terms of = their compute use cases. To my mind, these aspects can significantly impact= your vendor choices.=A0

MapR: My team simply did not consider them because of a= ll the proprietary code in there. We are trying to move from a monolithic p= roprietary product and one of the criteria we set was - if we decided to mo= ve away from the chosen hadoop vendor, can we easily unlock our data?
HortonWorks: Distro uses HDFS 1.x with MRv2. All open source. Cluster = management is via Ambari. Compared to Cloudera's CM, Ambari has very ru= dimentary features. But you have to keep in mind that Ambari is only an yea= r old where as CM already has been under development for several years. Thi= s was a major selection factor for us because Ambari did not have all the a= utomation/feature-set compared to CM for a single administrator/developer t= o easily maintain the cluster. Also, during the trial period, Hortonwork= 9;s packing format/structure apparently kept changing which made things a b= it difficult to centrally deploy/administer.=A0

Cloudera: Distro uses HDFS 2.x with MRv1. All open sour= ce except cluster management which is via their proprietary Cloudera Manage= r tool. It is free for use without certain feature like auditing and cluste= r replication features. Maybe a few more features are restricted to Enterpr= ise/Licensed-only version. Offers much more features than Ambari. In terms = of cluster administration, I found CM much easy to work with than Ambari. P= retty much all aspects from deploying new nodes to configuration and troubl= eshooting is much more refined than Ambari.

During the selection process, what I found was that bot= h vendors are very aggressive in their pitch. So much so that each pushes s= ome FUD regarding the competition.

HW uses HDFS 1.= x + MRv2 while CDH uses HDFS 2.x + MRv1. HW claimed that Cloudera's dis= tro is heavily patched off-course from the core Apache trunk that can cause= severe data corruption issues. Yes, Cloudera has some 1500+ patches over a= pache's Hadoop distro but (1) they aren't private patches. You can = pull the list and verify that yourself just as I did. (2) In our testing an= d talking to other Cloudera customers, I couldn't find any issues with = data corruption. It is true though that HDFS 2.x is still in beta but so is= MRv2 that HW uses. I think both are stable and work well - depending on wh= at you need but each uses that point to create FUD.

HW also claimed that a new SQL engine that Cloudera'= ;s including in their distro - Impala is proprietary. Not true. The softwar= e is open source. But if you want support for Impala then Cloudera will cha= rge you separately per node for Impala over and above what they charge per = node for Hadoop support.

In my experience, both products have plenty of issues w= hen it comes to compute engines - Hive, Pig etc and their cluster managemen= t software. HDFS seem to be solid in both distros. So I wouldn't call e= ither of them trouble-free and neither is at the maturity level of other po= pular enterprise products like say, Oracle. That said, you have to keep in = mind that both vendors/products are successfully used by several customers = so again, it is more a question of what fits your needs.

In the end, we chose to go with Cloudera mostly because= a more positive experience with CM in terms of administration/operations a= nd their pre-sales team when compared to HW. Again, that said, another team= that we closely work with chose HW for their cluster. I use both vendors/c= lusters at work and neither has any significant issues.




On Sat, Sep 14, 2013 at 12:37 PM, Chris Mattmann <mattmann@apache.org> wrote:
Here's the deal, folks can post question= s to the list that aren't
abusive and simply asking what the difference between different vendor
implementations (downstream) of Apache =A0Hadoop is not an inflammatory
or abusive question.

Stick to the facts. Discuss it here. Why should the Apache Hadoop
PMC push off potentially useful questions that may have upstream
implications to the Apache =A0Hadoop core and let all the innovation
occur downstream?

Have the conversations here if you'd like. I wouldn't turn anyone away..

My 2c.

Cheers,
Chris

----Original Message-----

From: Shahab Yunus <shahab.yun= us@gmail.com>
Reply-To: "user@hadoop.apach= e.org" <user@hadoop.a= pache.org>
Date: Friday, September 13, 2013 10:48 AM
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Subject: Re: Cloudera Vs Hortonworks Vs MapR

>I think, in my opinion, it is a wrong idea because:
>
>
>1- Many of the participants here are employees for these very companies=
>that are under discussion. This puts these respective employees in very=
>difficult position. It is very hard to come with a correct response. >Comments can be misconstrued easily.
>2- Also, when we talk about vendor distributions of the software, it is=
>not longer purely about open source. Now companies with the related
>corporate legal baggage also gets in the mix.
>3- The discussion would be on not only positive things about each vendo= r
>but in fact negatives. The latter type of =A0discussion which can get >unpleasant very easily.
>
>4- Somebody mentioned that, this is a very lightly moderated platform a= nd
>thus this discussion should be allowed. I think this is one of the
>reasons that it should not be because, people can say things casually,<= br> >without much thought, or without taking
> care of the context or the possible interpretations and get in trouble= .
>5- The risk here is not only that serious repercussions can occur (whic= h
>very well can) but the greater risk is that it can cause misunderstandi= ng
>between individuals, industries and companies.
>6-People here lot of time reply quickly just to resolve or help the
>'technical' issue. Now they will have to take care how they fra= me the
>response. Re: 4
>
>
>I know some will feel that I have created a highly exaggerated scenario=
>above, but what I am trying to say is that, it is a slippery slope. If = we
>allow this then this can go anywhere.
>
>
>By the way, I do not work for any of these vendors.
>
>
>More importantly, I am not saying that this discussion should not be ha= d,
>I am just saying that this is a wrong forum.
>
>
>Just my 2 cents (or,...this was rather a dollar.)
>
>
>Regards,
>Shahab
>
>
>
>
>On Fri, Sep 13, 2013 at 1:50 AM, Chris Mattmann
><mattmann@apache.org> = wrote:
>
>Errr, what's wrong with discussing these types of issues on list? >
>Nothing public here, and as long as it's kept to facts, this should=
>not be a problem and Apache is a fine place to have such discussions. >
>My 2c.
>
>
>
>
>
>-----Original Message-----
>From: Xuri Nagarin <secsubs@gma= il.com>
>Reply-To: "user@hadoop.a= pache.org" <user@hado= op.apache.org>
>Date: Thursday, September 12, 2013 4:39 PM
>To: "user@hadoop.apache.= org" <user@hadoop.apa= che.org>
>Subject: Re: Cloudera Vs Hortonworks Vs MapR
>
>>I understand it can be contentious issue especially given that a lo= t of
>>contributors to this list work for one or the other vendor or have = some
>>stake in any kind of evaluation. But, I see no reason why users sho= uld
>>not be able to compare notes
>> and share experiences. Over time, genuine pain points or issues or=
>>claims will bubble up and should only help the community. Sure, the= re
>>will be a few flame wars but this already isn't a very tightly = moderated
>>list.
>>
>>
>>
>>
>>
>>
>>
>>On Thu, Sep 12, 2013 at 11:14 AM, Aaron Eng
>><aeng@maprtech.com> = wrote:
>>
>>Raj,
>>
>>
>>As others noted, this is not a great place for this discussion. =A0= I'd
>>suggest contacting the vendors you are interested in as I'm sur= e we'd all
>>be happy to provide you more details.
>>
>>
>>I don't know about the others, but for MapR, just send an email= to
>>sales@mapr.com <mailto:sales@mapr.com> and I'm sure some= one will get back
>>to you with more information.
>>
>>
>>Best Regards,
>>Aaron Eng
>>
>>
>>
>>On Thu, Sep 12, 2013 at 10:19 AM, Hadoop Raj <hadoopraj@yahoo.com> wrote:
>>
>>
>>Hi,
>>
>>We are trying to evaluate different implementations of Hadoop for o= ur big
>>data enterprise project.
>>
>>Can the forum members advise on what are the advantages and disadva= ntages
>>of each implementation i.e. Cloudera Vs Hortonworks Vs MapR.
>>
>>Thanks in advance.
>>
>>Regards,
>>Raj
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
>
>
>
>
>



--20cf307f31708038d604e6877a2d--