hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manu S <manupk...@gmail.com>
Subject Best practice to setup Sqoop,Pig and Hive for a hadoop cluster ?
Date Thu, 15 Mar 2012 12:04:25 GMT
Greetings All !!!

I am using Cloudera CDH3 for Hadoop deployment. We have 7 nodes, in which 5
are used for a fully distributed cluster, 1 for pseudo-distributed & 1 as

Fully distributed cluster: HDFS, Mapreduce & Hbase cluster
Pseudo distributed mode: All

I had read about we can install Pig, hive & Sqoop on the client node, no
need to install it in cluster. What is the client node actually? Can I use
my management-node as a client?

What is the best practice to install Pig, Hive, & Sqoop?
For the fully distributed cluster do we need to install Pig, Hive, & Sqoop
in each nodes?

Mysql is needed for Hive as a metastore and sqoop can import mysql database
to HDFS or hive or pig, so can we make use of mysql DB's residing on
another node?

Thanks & Regards
Manu S
SI Engineer - OpenSource & HPC
Wipro Infotech
Mob: +91 8861302855                Skype: manuspkd

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message