bigtop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Núñez <steven.nu...@illation.com>
Subject Re: CentOS Out of Box Install Summary
Date Fri, 22 Nov 2013 03:38:32 GMT
I think focusing on a single-node installation is probably the best bet at the moment. Jay
gave some sound advice for practical usage: start small and build from there, but given that
Hadoop and its ecosystem are still in the formative stages, there’s going to be a lot of
people that want to kick the tires and explore the components.

Having a few well-tested recipes, the first a single-node set-up, would be ideal. It’s probably
easier to start with a well-configured single-node installation and expand from there then
trying to sort out both component configuration and the distributed aspect at the same time.

The Cloudera website has some installation instructions for Installing CDH4 on a Single Linux
Node in Pseudo-distributed Mode<http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Quick-Start/cdh4qs_topic_3.html>
that might be useful as a guide. At the end there’s a section Components That Require Additional
Configuration<http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Quick-Start/cdh4qs_topic_3_4.html>
that, while not terrible helpful in the set-up, at least provides some pointers so that the
reader knows it’s not supposed to work out of the box.

Sean, in an earlier message you wrote:
Not sure what to tell you here. I regularly set up pseudo-distributed Hadoop installations
in minutes with little more than "yum install hadoop-conf-pseudo", "sudo service hadoop-hdfs-namenode
init" and a reboot. If you're using a bunch of other services on a fully-distributed cluster
and you're completely new to this, I would expect it take hours / days to get everything running.
Bigtop also maintains puppet code that will configure everything with a pretty good default
configuration and have your cluster working pretty much out-of-the-box. Maybe this is a good
option for you?

Two questions:

  *   Those commands are the same as in the Cloudera documentation. Are those components also
in the BigTop repository? I’m aware of some of the yum searching commands (I’m a FreeBSD
user myself — where’s my Hadoop distribution? Just kidding.); is there a good way to explore/browse
the repository to see what’s in BigTop?
  *   Where would I find the puppet code and how would I run it? If this is a good route,
perhaps just documentation is all that’s needed.

From: Sean Mackrory <mackrorysd@gmail.com<mailto:mackrorysd@gmail.com>>
Reply-To: "user@bigtop.apache.org<mailto:user@bigtop.apache.org>" <user@bigtop.apache.org<mailto:user@bigtop.apache.org>>
Date: Thursday, 21 November 2013 23:48
To: "user@bigtop.apache.org<mailto:user@bigtop.apache.org>" <user@bigtop.apache.org<mailto:user@bigtop.apache.org>>
Subject: Re: CentOS Out of Box Install Summary

One key point is that the components that are running out of the box are mostly running in
a single-node configuration or with an embedded database as a backend. Practically all of
these systems will require some manual configuration before they are production-ready. Neither
packages nor puppet can solve that entirely - we would really need something that can orchestrate
the different roles in the cluster in bringing up the services. Even then, I suspect such
a system would require some manual input regarding what you want, because there are so many
different ways you might want to deploy all this.

- Hadoop zkfc: This is for high-availability in HDFS. I don't know the specifics but I would
not expect this to be running out-of-the-box.
- I don't have a ton of experience with the other Hadoop daemons but I know the NodeManager
usually works for me. I'd be curious to know what problem you ran into here.
- We could probably make a "hbase-conf-pseudo" package that installs a working single-node
configuration, but again - it would never be used that way in most cases. I thought by default
the master operated in "stand-alone" mode, and by enabling "distributed mode" in the configuration
you could then run a region server on the same node. See http://hbase.apache.org/book/standalone_dist.html.
- The Hive Metastore needs an external RDBMS to be configured. Some services come with a default
"embedded" database but these are never suitable for production and usually cause more trouble
than they are worth, IMHO. I love the sound of "everything working out of the box", but I
think this is one case where we need to help the user understand what external infrastructure
is required to make the system work properly.
- Not familiar with Spark, but I believe we stopped shipping Scala embedded in Spark and a
user would need to have it installed beforehand, just like with Java? I'm probably wrong here
- just a hint.

Thanks for sharing your emails with the list. As Jay Vyas mentioned - a lot of the contributors
can get busy at times but it would be great to start collecting this information into a better
"User Manual".


On Wed, Nov 20, 2013 at 6:32 PM, Steven Núñez <steven.nunez@illation.com<mailto:steven.nunez@illation.com>>
wrote:
Gents,

Below is a summary of the results of an out of the box CentOS/EC2 BigTop 0.70.0 install. It
lists all the components I need for the project I’m writing about. What would be useful
somewhere on the wiki is a list of known issues and a page to some possible resolutions. This
could be as easy as taking this list and adding a third column ‘workaround’ with a page
on how to fix it. It could also be used as a QA page of sorts, on the assumption that all
of the components are supposed to work out of the box (looks like some of the init.d scripts
aren’t quite right either judging by the error below).

Cheers,
- SteveN

Hadoop datanode is running                                 [  OK  ]
Hadoop journalnode is running                              [  OK  ]
Hadoop namenode is running                                 [  OK  ]
Hadoop secondarynamenode is running                        [  OK  ]
Hadoop zkfc is dead and pid file exists                    [FAILED]
Hadoop httpfs is running                                   [  OK  ]
Hadoop historyserver is dead and pid file exists           [FAILED]
Hadoop nodemanager is dead and pid file exists             [FAILED]
Hadoop proxyserver is dead and pid file exists             [FAILED]
Hadoop resourcemanager is running                          [  OK  ]
hald (pid  1041) is running...
HBase master daemon is dead and pid file exists            [FAILED]
hbase-regionserver is not running.
HBase rest daemon is running                               [  OK  ]
HBase thrift daemon is running                             [  OK  ]
HCatalog server is running                                 [  OK  ]
Hive Metastore is dead and pid file exists                 [FAILED]
Hive Server is running                                     [  OK  ]
Hive Server2 is dead and pid file exists                   [FAILED]
not running but /var/run/oozie/oozie.pid exists.
Spark master is not running                                [FAILED]
Spark worker is not running                                [FAILED]
spice-vdagentd is stopped
Sqoop Server is running                                    [  OK  ]



Mime
View raw message