hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rahul Channe <drah...@googlemail.com>
Subject Re: Loading Sybase to hive using sqoop
Date Thu, 25 Aug 2016 14:08:35 GMT
Thank you all for valuable inputs

On Wednesday, August 24, 2016, Mich Talebzadeh <mich.talebzadeh@gmail.com>
wrote:

> If this is one off then Spark will do OK.
>
> Sybase IQ provides bcp that creates a flat file tab/comma separated and
> you can use that to extract IQ table and put it on HDFS and create an
> external table.
>
> This is of course is a one off.
>
> You can also use SRS (SAP Replication Server) to get the data out first
> time and sync Hive table with Sybase IQ table real time. You will need SRS
> SP 204 or above to make this work.
>
> Talk to your DBA if they can get SRS SP from Sybase for this purpose. I
> have done it many times. I think it is stable enough for this purpose.
>
> HTH
>
>
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 24 August 2016 at 22:35, Gopal Vijayaraghavan <gopalv@apache.org
> <javascript:_e(%7B%7D,'cvml','gopalv@apache.org');>> wrote:
>
>>
>>
>> > val d = HiveContext.read.format("jdbc").options(
>> ...
>> >> The sqoop job takes 7 hours to load 15 days of data, even while setting
>> >>the direct load option to 6. Hive is using MR framework.
>>
>> In generaly, the jdbc implementations tend to react rather badly to large
>> extracts like this - the throttling usually happens on the operational
>> database end rather than being a problem on the MR side.
>>
>>
>> Sqoop is good enough for a one-shot import, but doing it frequently is
>> best done by the database's own dump protocols, which are generally not
>> throttled similarly.
>>
>> Pinterest recently put out a document on how they do this
>>
>> https://engineering.pinterest.com/blog/tracker-ingesting-mys
>> ql-data-scale-p
>> art-1
>> <https://engineering.pinterest.com/blog/tracker-ingesting-mysql-data-scale-part-1>
>>
>> +
>> https://engineering.pinterest.com/blog/tracker-ingesting-mys
>> ql-data-scale-p
>> art-2
>> <https://engineering.pinterest.com/blog/tracker-ingesting-mysql-data-scale-part-2>
>>
>> More interesting continous ingestion reads directly off the replication
>> protocol write-ahead logs.
>>
>> https://github.com/Flipkart/MySQL-replication-listener/tree/
>> master/examples
>> /mysql2hdfs
>> <https://github.com/Flipkart/MySQL-replication-listener/tree/master/examples/mysql2hdfs>
>>
>> +
>> https://github.com/flipkart-incubator/storm-mysql
>>
>>
>> But all of these tend to be optimized to a database engine, while the JDBC
>> pipe tends to work slowly for all engines.
>>
>> Cheers,
>> Gopal
>>
>>
>>
>

Mime
View raw message