hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Martin, Nick" <NiMar...@pssd.com>
Subject RE: Hortonworks HDP 2 sandbox or Cloudera CDH Distribution
Date Thu, 13 Mar 2014 14:17:16 GMT
Hi Andy,

Generally speaking, the folks participating on this list avoid questions of distribution preference.
There are, perhaps obviously, both minor and significant differences in distributions that
you should research and evaluate to find the best fit for your organization's strategy. Asking
the members of this list to publically advocate one distribution over another is outside the
scope of our collective purpose here, in my opinion. Upon thorough review of the topic history
of this list you'll doubtless find the questions and responses are almost always distribution
agnostic, which is how things should be with a community like this.

No matter which distribution you choose, said distribution will assuredly have ample documentation
regarding cluster configuration readily available via a quick search from your web browser.
Further, the two distributions you mention below also have several methods by which you can
ask their experts specific questions related to configuring their solutions in your environment
(forums, separate lists, Google groups, etc.).

From: ados1984@gmail.com [mailto:ados1984@gmail.com]
Sent: Thursday, March 13, 2014 9:58 AM
To: user
Subject: Hortonworks HDP 2 sandbox or Cloudera CDH Distribution

Hello Team,

I am initiating an POC to see value of having hadoop in our architecture and so after discussing
my current scenario with experts here, i think it would be better for me to start using sandbox
version rather then using actual distribution from POC point of view.

My query here is how to decide what sandbox version to use Hortonworks or Cloudera, my goal
is to get started as soon as possible and not spend most time on configuration part of the

Also, from online research that i have done, it appears that Cloudera Impala is more efficient
and provides near real time ad-hoc queries capabilities and based on that am thinking of going
towards Cloudera sandbox distribution and wanted to learn from experts opinion before moving
in that direction.

Also - if am going through sandbox approach, what kind of cluster configuration can i have,
meaning how many slave and master nodes will sandbox support.

Pardon my question if they sound to basic.

Thanks again, Andy.

View raw message