hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: Loading Sybase to hive using sqoop
Date Wed, 24 Aug 2016 21:08:53 GMT
hm. Watching paint dry :)

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 24 August 2016 at 22:07, Rahul Channe <drahulc@googlemail.com> wrote:

> We are running hive on MR
>
>
> On Wednesday, August 24, 2016, Mich Talebzadeh <mich.talebzadeh@gmail.com>
> wrote:
>
>> Sybase IQ uses jconn4.jar for ODBC connection. This is the way I use
>> Spark to get IQ data into Hive table. You can specify partition in Sqoop as
>> well.
>>
>> I started using Sqoop to populate Hive tables but decided to use Spark.
>>
>> Also are you running Hive on Map-reduce engine?
>>
>>   private var dbURL = "jdbc:sybase:Tds:rhes564:21000/SYB_IQ"
>>   private var dbDatabase = null
>>   private var dbUserName ="loader"
>>   private var dbPassword = "xxxxxxxx"
>>
>>   private var lowerBoundValue = "1"
>>   private var upperBoundValue = "100000000"
>>   private var numPartitionsValue = "100"  // This is your partition
>> number in Hive table
>>   // Get data from IQ table
>>
>>  val d = HiveContext.read.format("jdbc").options(
>>  Map("url" -> dbURL,
>>  "dbtable" -> "dummy)",
>>  "partitionColumn" -> partitionColumnName,
>>  "lowerBound" -> lowerBoundValue,
>>  "upperBound" -> upperBoundValue,
>>  "numPartitions" -> numPartitionsValue,
>>  "user" -> dbUserName,
>>  "password" -> dbPassword)).load
>> // Register it as a temp table
>>   d.registerTempTable("tmp")
>>
>> Insert into Hive table
>>
>>  sqltext = """
>>   INSERT INTO TABLE dummy
>>   SELECT
>>           ID
>>         , CLUSTERED
>>         , SCATTERED
>>         , RANDOMISED
>>         , RANDOM_STRING
>>         , SMALL_VC
>>         , PADDING
>>   FROM tmp
>>   """
>>    HiveContext.sql(sqltext)
>>
>>
>> HTH
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 23 August 2016 at 20:48, Rahul Channe <drahulc@googlemail.com> wrote:
>>
>>> Hi All,
>>>
>>> We are trying to load data from Sybase Iq table to hive using sqoop. The
>>> hive table is partitioned and expecting to hold 29M records per day.
>>>
>>> The sqoop job takes 7 hours to load 15 days of data, even while setting
>>> the direct load option to 6. Hive is using MR framework.
>>>
>>> Is there is way to speed up the process.
>>>
>>> Note - the aim is to load 1 year of data
>>>
>>
>>

Mime
View raw message