hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Omkar Joshi <>
Subject Import from MySQL to Hive using Sqoop
Date Thu, 27 Jun 2013 04:13:40 GMT

I have to import > 400 million rows from a MySQL table(having a composite primary key)
into a PARTITIONED Hive table Hive via Sqoop. The table has data for two years with a column
departure date ranging from 20120605 to 20140605 and thousands of records for one day. I need
to partition the data based on the departure date.

The versions :

Apache Hadoop  -           1.0.4
Apache Hive      -           0.9.0
Apache Sqoop    -           sqoop-1.4.2.bin__hadoop-1.0.0

As per my knowledge, there are 3 approaches:

1.    MySQL -> Non-partitioned Hive table -> INSERT from Non-partitioned Hive table
into Partitioned Hive table

The current painful one that I'm following

2.    MySQL -> Partitioned Hive table

I read that the support for this is added in later(?) versions of Hive and Sqoop but was unable
to find an example

3.    MySQL -> Non-partitioned Hive table -> ALTER Non-partitioned Hive table to add
The syntax dictates to specify partitions as key value pairs - not feasible in case of millions
of records where one cannot think of all the partition key-value pairs

Can anyone provide inputs for approaches 2 and 3?

Omkar Joshi

The contents of this e-mail and any attachment(s) may contain confidential or privileged information
for the intended recipient(s). Unintended recipients are prohibited from taking action on
the basis of information in this e-mail and using or disseminating the information, and must
notify the sender and delete it from their system. L&T Infotech will not accept responsibility
or liability for the accuracy or completeness of, or the presence of any virus or disabling
code in this e-mail"

View raw message