hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bejoy Ks <bejoy.had...@gmail.com>
Subject Re: Best practice to setup Sqoop,Pig and Hive for a hadoop cluster ?
Date Thu, 15 Mar 2012 12:57:10 GMT
Hi Manu
      Please find my responses inline

>I had read about we can install Pig, hive & Sqoop on the client node, no
need to install it in cluster. What is the client node actually? Can I use
my management-node as a client?

On larger clusters we have different node that is out of hadoop cluster and
these stay in there. So user programs would be triggered from this node.
This is the node refereed to as client node/ edge node etc . For your
cluster management node and client node can be the same

>What is the best practice to install Pig, Hive, & Sqoop?

On a client node

>For the fully distributed cluster do we need to install Pig, Hive, & Sqoop
>in each nodes?

No, can be on a client node or on any of the nodes

>Mysql is needed for Hive as a metastore and sqoop can import mysql database
to HDFS or hive or pig, so can we make use of mysql DB's residing on
another node?
Regarding your first point, SQOOP import is for different purpose, to get
data from RDBNS into hdfs. But the meta stores is used by hive  in framing
the map reduce jobs corresponding to your hive query. Here SQOOP can't help
you much
Recommend to have the metastore db of hive on the same node where hive is
installed as for execution hive queries there is meta data look up required
much especially when your table has large number of partitions and all.

Regards
Bejoy.K.S

On Thu, Mar 15, 2012 at 5:34 PM, Manu S <manupkd87@gmail.com> wrote:

> Greetings All !!!
>
> I am using Cloudera CDH3 for Hadoop deployment. We have 7 nodes, in which 5
> are used for a fully distributed cluster, 1 for pseudo-distributed & 1 as
> management-node.
>
> Fully distributed cluster: HDFS, Mapreduce & Hbase cluster
> Pseudo distributed mode: All
>
> I had read about we can install Pig, hive & Sqoop on the client node, no
> need to install it in cluster. What is the client node actually? Can I use
> my management-node as a client?
>
> What is the best practice to install Pig, Hive, & Sqoop?
> For the fully distributed cluster do we need to install Pig, Hive, & Sqoop
> in each nodes?
>
> Mysql is needed for Hive as a metastore and sqoop can import mysql database
> to HDFS or hive or pig, so can we make use of mysql DB's residing on
> another node?
>
> --
> Thanks & Regards
> ----
> Manu S
> SI Engineer - OpenSource & HPC
> Wipro Infotech
> Mob: +91 8861302855                Skype: manuspkd
> www.opensourcetalk.co.in
>

Mime
View raw message