hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Embree <cemb...@gmail.com>
Subject Re: Cloudera Vs Hortonworks Vs MapR
Date Tue, 17 Sep 2013 01:17:53 GMT
Our evaluation was similar except we did not consider the "management"
tools any vendor provided as that's just as much lock in as any proprietary
tool.  What if I want trade vendors?  I have to re-tool to use there mgmt?
 Nope, wrote our own.

Being in a large enterprise, we went with the "perceived" more stable
platform.  Draw your own conclusions.


On Mon, Sep 16, 2013 at 6:10 PM, Xuri Nagarin <secsubs@gmail.com> wrote:

> So I will try to answer the OP's question best I can without deviating too
> much into opinions and stick to facts. Disclaimer: I am not an employee of
> either vendor or any partner of theirs.
>
> Context is important: My team's use case was general data exploration of
> semi-structured log data and we had no typical data-warehouse type of
> existing use cases. Also, our's is a small (less than 30 nodes cluster). In
> terms of ops/maintenance, we only have one person. I point this out because
> lots of hadoop shops have dedicated team for each - OS administration,
> Hadoop admin, Hadoop developers. And, they are very mature in terms of
> their compute use cases. To my mind, these aspects can significantly impact
> your vendor choices.
>
> MapR: My team simply did not consider them because of all the proprietary
> code in there. We are trying to move from a monolithic proprietary product
> and one of the criteria we set was - if we decided to move away from the
> chosen hadoop vendor, can we easily unlock our data?
> HortonWorks: Distro uses HDFS 1.x with MRv2. All open source. Cluster
> management is via Ambari. Compared to Cloudera's CM, Ambari has very
> rudimentary features. But you have to keep in mind that Ambari is only an
> year old where as CM already has been under development for several years.
> This was a major selection factor for us because Ambari did not have all
> the automation/feature-set compared to CM for a single
> administrator/developer to easily maintain the cluster. Also, during the
> trial period, Hortonwork's packing format/structure apparently kept
> changing which made things a bit difficult to centrally deploy/administer.
>
> Cloudera: Distro uses HDFS 2.x with MRv1. All open source except cluster
> management which is via their proprietary Cloudera Manager tool. It is free
> for use without certain feature like auditing and cluster replication
> features. Maybe a few more features are restricted to
> Enterprise/Licensed-only version. Offers much more features than Ambari. In
> terms of cluster administration, I found CM much easy to work with than
> Ambari. Pretty much all aspects from deploying new nodes to configuration
> and troubleshooting is much more refined than Ambari.
>
> During the selection process, what I found was that both vendors are very
> aggressive in their pitch. So much so that each pushes some FUD regarding
> the competition.
>
> HW uses HDFS 1.x + MRv2 while CDH uses HDFS 2.x + MRv1. HW claimed that
> Cloudera's distro is heavily patched off-course from the core Apache trunk
> that can cause severe data corruption issues. Yes, Cloudera has some 1500+
> patches over apache's Hadoop distro but (1) they aren't private patches.
> You can pull the list and verify that yourself just as I did. (2) In our
> testing and talking to other Cloudera customers, I couldn't find any issues
> with data corruption. It is true though that HDFS 2.x is still in beta but
> so is MRv2 that HW uses. I think both are stable and work well - depending
> on what you need but each uses that point to create FUD.
>
> HW also claimed that a new SQL engine that Cloudera's including in their
> distro - Impala is proprietary. Not true. The software is open source. But
> if you want support for Impala then Cloudera will charge you separately per
> node for Impala over and above what they charge per node for Hadoop support.
>
> In my experience, both products have plenty of issues when it comes to
> compute engines - Hive, Pig etc and their cluster management software. HDFS
> seem to be solid in both distros. So I wouldn't call either of them
> trouble-free and neither is at the maturity level of other popular
> enterprise products like say, Oracle. That said, you have to keep in mind
> that both vendors/products are successfully used by several customers so
> again, it is more a question of what fits your needs.
>
> In the end, we chose to go with Cloudera mostly because a more positive
> experience with CM in terms of administration/operations and their
> pre-sales team when compared to HW. Again, that said, another team that we
> closely work with chose HW for their cluster. I use both vendors/clusters
> at work and neither has any significant issues.
>
>
>
>
> On Sat, Sep 14, 2013 at 12:37 PM, Chris Mattmann <mattmann@apache.org>wrote:
>
>> Here's the deal, folks can post questions to the list that aren't
>> abusive and simply asking what the difference between different vendor
>> implementations (downstream) of Apache  Hadoop is not an inflammatory
>> or abusive question.
>>
>> Stick to the facts. Discuss it here. Why should the Apache Hadoop
>> PMC push off potentially useful questions that may have upstream
>> implications to the Apache  Hadoop core and let all the innovation
>> occur downstream?
>>
>> Have the conversations here if you'd like. I wouldn't turn anyone
>> away..
>>
>> My 2c.
>>
>> Cheers,
>> Chris
>>
>> ----Original Message-----
>>
>> From: Shahab Yunus <shahab.yunus@gmail.com>
>> Reply-To: "user@hadoop.apache.org" <user@hadoop.apache.org>
>> Date: Friday, September 13, 2013 10:48 AM
>> To: "user@hadoop.apache.org" <user@hadoop.apache.org>
>> Subject: Re: Cloudera Vs Hortonworks Vs MapR
>>
>> >I think, in my opinion, it is a wrong idea because:
>> >
>> >
>> >1- Many of the participants here are employees for these very companies
>> >that are under discussion. This puts these respective employees in very
>> >difficult position. It is very hard to come with a correct response.
>> >Comments can be misconstrued easily.
>> >2- Also, when we talk about vendor distributions of the software, it is
>> >not longer purely about open source. Now companies with the related
>> >corporate legal baggage also gets in the mix.
>> >3- The discussion would be on not only positive things about each vendor
>> >but in fact negatives. The latter type of  discussion which can get
>> >unpleasant very easily.
>> >
>> >4- Somebody mentioned that, this is a very lightly moderated platform and
>> >thus this discussion should be allowed. I think this is one of the
>> >reasons that it should not be because, people can say things casually,
>> >without much thought, or without taking
>> > care of the context or the possible interpretations and get in trouble.
>> >5- The risk here is not only that serious repercussions can occur (which
>> >very well can) but the greater risk is that it can cause misunderstanding
>> >between individuals, industries and companies.
>> >6-People here lot of time reply quickly just to resolve or help the
>> >'technical' issue. Now they will have to take care how they frame the
>> >response. Re: 4
>> >
>> >
>> >I know some will feel that I have created a highly exaggerated scenario
>> >above, but what I am trying to say is that, it is a slippery slope. If we
>> >allow this then this can go anywhere.
>> >
>> >
>> >By the way, I do not work for any of these vendors.
>> >
>> >
>> >More importantly, I am not saying that this discussion should not be had,
>> >I am just saying that this is a wrong forum.
>> >
>> >
>> >Just my 2 cents (or,...this was rather a dollar.)
>> >
>> >
>> >Regards,
>> >Shahab
>> >
>> >
>> >
>> >
>> >On Fri, Sep 13, 2013 at 1:50 AM, Chris Mattmann
>> ><mattmann@apache.org> wrote:
>> >
>> >Errr, what's wrong with discussing these types of issues on list?
>> >
>> >Nothing public here, and as long as it's kept to facts, this should
>> >not be a problem and Apache is a fine place to have such discussions.
>> >
>> >My 2c.
>> >
>> >
>> >
>> >
>> >
>> >-----Original Message-----
>> >From: Xuri Nagarin <secsubs@gmail.com>
>> >Reply-To: "user@hadoop.apache.org" <user@hadoop.apache.org>
>> >Date: Thursday, September 12, 2013 4:39 PM
>> >To: "user@hadoop.apache.org" <user@hadoop.apache.org>
>> >Subject: Re: Cloudera Vs Hortonworks Vs MapR
>> >
>> >>I understand it can be contentious issue especially given that a lot of
>> >>contributors to this list work for one or the other vendor or have some
>> >>stake in any kind of evaluation. But, I see no reason why users should
>> >>not be able to compare notes
>> >> and share experiences. Over time, genuine pain points or issues or
>> >>claims will bubble up and should only help the community. Sure, there
>> >>will be a few flame wars but this already isn't a very tightly moderated
>> >>list.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>On Thu, Sep 12, 2013 at 11:14 AM, Aaron Eng
>> >><aeng@maprtech.com> wrote:
>> >>
>> >>Raj,
>> >>
>> >>
>> >>As others noted, this is not a great place for this discussion.  I'd
>> >>suggest contacting the vendors you are interested in as I'm sure we'd
>> all
>> >>be happy to provide you more details.
>> >>
>> >>
>> >>I don't know about the others, but for MapR, just send an email to
>> >>sales@mapr.com <mailto:sales@mapr.com> and I'm sure someone will get
>> back
>> >>to you with more information.
>> >>
>> >>
>> >>Best Regards,
>> >>Aaron Eng
>> >>
>> >>
>> >>
>> >>On Thu, Sep 12, 2013 at 10:19 AM, Hadoop Raj <hadoopraj@yahoo.com>
>> wrote:
>> >>
>> >>
>> >>Hi,
>> >>
>> >>We are trying to evaluate different implementations of Hadoop for our
>> big
>> >>data enterprise project.
>> >>
>> >>Can the forum members advise on what are the advantages and
>> disadvantages
>> >>of each implementation i.e. Cloudera Vs Hortonworks Vs MapR.
>> >>
>> >>Thanks in advance.
>> >>
>> >>Regards,
>> >>Raj
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>>
>>
>>
>

Mime
View raw message