hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Omkar Joshi <Omkar.Jo...@lntinfotech.com>
Subject RE: Import from MySQL to Hive using Sqoop
Date Thu, 27 Jun 2013 06:25:39 GMT
Hi Nitin,

Thanks for the inputs - will try out those.

Regards,
Omkar Joshi

From: Nitin Pawar [mailto:nitinpawar432@gmail.com]
Sent: Thursday, June 27, 2013 11:48 AM
To: user@hive.apache.org
Subject: Re: Import from MySQL to Hive using Sqoop


Disclaimer: I am not a sqoop guru so here are just suggestions,

sqoop documentation says,
"Sqoop job to import data for Hive into a particular partition by specifying the --hive-partition-key
and --hive-partition-value arguments"

I have not tried these, but not sure will it works in case of dynamic partitioning.

Also, not sure have you looked at incremental imports so that you do not have to import old
data again and again.

Can you put the same question across sqoop user group?

To answer your questions:

For (2), I already have given the options to use above

For (3), As long as you are just importing one date's data and your partition key is that
date column, you can write into a directory something like hdfs://blah/datastore/table/partitioncolumn=value/
you can register that partition with hive with one more step,

This approach is what option 2 implements  where it imports data into a single partition for
a given value.


On Thu, Jun 27, 2013 at 9:43 AM, Omkar Joshi <Omkar.Joshi@lntinfotech.com<mailto:Omkar.Joshi@lntinfotech.com>>
wrote:
Hi,

I have to import > 400 million rows from a MySQL table(having a composite primary key)
into a PARTITIONED Hive table Hive via Sqoop. The table has data for two years with a column
departure date ranging from 20120605 to 20140605 and thousands of records for one day. I need
to partition the data based on the departure date.

The versions :

Apache Hadoop  -           1.0.4
Apache Hive      -           0.9.0
Apache Sqoop    -           sqoop-1.4.2.bin__hadoop-1.0.0

As per my knowledge, there are 3 approaches:

1.    MySQL -> Non-partitioned Hive table -> INSERT from Non-partitioned Hive table
into Partitioned Hive table

The current painful one that I'm following

2.    MySQL -> Partitioned Hive table

I read that the support for this is added in later(?) versions of Hive and Sqoop but was unable
to find an example

3.    MySQL -> Non-partitioned Hive table -> ALTER Non-partitioned Hive table to add
PARTITION
The syntax dictates to specify partitions as key value pairs - not feasible in case of millions
of records where one cannot think of all the partition key-value pairs

Can anyone provide inputs for approaches 2 and 3?

Regards,
Omkar Joshi


________________________________
The contents of this e-mail and any attachment(s) may contain confidential or privileged information
for the intended recipient(s). Unintended recipients are prohibited from taking action on
the basis of information in this e-mail and using or disseminating the information, and must
notify the sender and delete it from their system. L&T Infotech will not accept responsibility
or liability for the accuracy or completeness of, or the presence of any virus or disabling
code in this e-mail"



--
Nitin Pawar

Mime
View raw message