hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manu S <manupk...@gmail.com>
Subject Re: Best practice to setup Sqoop,Pig and Hive for a hadoop cluster ?
Date Thu, 15 Mar 2012 15:59:08 GMT
Thanks a lot all :-)
On Mar 15, 2012 7:03 PM, "Marcos Ortiz" <mlortiz@uci.cu> wrote:
>
>
>
> On 03/15/2012 09:22 AM, Manu S wrote:
>>
>> Thanks a lot Bijoy, that makes sense :)
>>
>> Suppose if I have Mysql database in some other node(not in hadoop
cluster), can I import the tables using sqoop to my HDFS?
>
> Yes, this is the main purpose of Sqoop
> On the Cloudera site, you have the completed documentation for it
>
> Sqoop User Guide
> http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html
>
> Sqoop installation
> https://ccp.cloudera.com/display/CDHDOC/Sqoop+Installation
>
> Sqoop for MySQL
> http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html#_mysql
>
> Sqoop site on GitHub
> http://github.com/cloudera/sqoop
>
> Cloudera blog related post to Sqoop
> http://www.cloudera.com/blog/category/sqoop/
>
>
> Best wishes
>
>
>>
>>
>> On Thu, Mar 15, 2012 at 6:27 PM, Bejoy Ks <bejoy.hadoop@gmail.com> wrote:
>>>
>>> Hi Manu
>>>      Please find my responses inline
>>>
>>> >I had read about we can install Pig, hive & Sqoop on the client node,
no
>>> need to install it in cluster. What is the client node actually? Can I
use
>>> my management-node as a client?
>>>
>>> On larger clusters we have different node that is out of hadoop cluster
and
>>> these stay in there. So user programs would be triggered from this node.
>>> This is the node refereed to as client node/ edge node etc . For your
>>> cluster management node and client node can be the same
>>>
>>> >What is the best practice to install Pig, Hive, & Sqoop?
>>>
>>> On a client node
>>>
>>> >For the fully distributed cluster do we need to install Pig, Hive, &
Sqoop
>>> >in each nodes?
>>>
>>> No, can be on a client node or on any of the nodes
>>>
>>> >Mysql is needed for Hive as a metastore and sqoop can import mysql
database
>>> to HDFS or hive or pig, so can we make use of mysql DB's residing on
>>> another node?
>>> Regarding your first point, SQOOP import is for different purpose, to
get
>>> data from RDBNS into hdfs. But the meta stores is used by hive  in
framing
>>> the map reduce jobs corresponding to your hive query. Here SQOOP can't
help
>>> you much
>>> Recommend to have the metastore db of hive on the same node where hive
is
>>> installed as for execution hive queries there is meta data look up
required
>>> much especially when your table has large number of partitions and all.
>>>
>>> Regards
>>> Bejoy.K.S
>>>
>>> On Thu, Mar 15, 2012 at 5:34 PM, Manu S <manupkd87@gmail.com> wrote:
>>>
>>> > Greetings All !!!
>>> >
>>> > I am using Cloudera CDH3 for Hadoop deployment. We have 7 nodes, in
which 5
>>> > are used for a fully distributed cluster, 1 for pseudo-distributed &
1 as
>>> > management-node.
>>> >
>>> > Fully distributed cluster: HDFS, Mapreduce & Hbase cluster
>>> > Pseudo distributed mode: All
>>> >
>>> > I had read about we can install Pig, hive & Sqoop on the client node,
no
>>> > need to install it in cluster. What is the client node actually? Can
I use
>>> > my management-node as a client?
>>> >
>>> > What is the best practice to install Pig, Hive, & Sqoop?
>>> > For the fully distributed cluster do we need to install Pig, Hive, &
Sqoop
>>> > in each nodes?
>>> >
>>> > Mysql is needed for Hive as a metastore and sqoop can import mysql
database
>>> > to HDFS or hive or pig, so can we make use of mysql DB's residing on
>>> > another node?
>>> >
>>> > --
>>> > Thanks & Regards
>>> > ----
>>> > Manu S
>>> > SI Engineer - OpenSource & HPC
>>> > Wipro Infotech
>>> > Mob: +91 8861302855                Skype: manuspkd
>>> > www.opensourcetalk.co.in
>>> >
>>
>>
>>
>>
>> --
>> Thanks & Regards
>> ----
>> Manu S
>> SI Engineer - OpenSource & HPC
>> Wipro Infotech
>> Mob: +91 8861302855                Skype: manuspkd
>> www.opensourcetalk.co.in
>>
>>
>>
>
> --
> Marcos Luis Ortíz Valmaseda
>  Sr. Software Engineer (UCI)
>  http://marcosluis2186.posterous.com
>  http://postgresql.uci.cu/blog/38
>
>
>

Mime
View raw message