hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: Loading Sybase to hive using sqoop
Date Wed, 24 Aug 2016 22:19:06 GMT
If this is one off then Spark will do OK.

Sybase IQ provides bcp that creates a flat file tab/comma separated and you
can use that to extract IQ table and put it on HDFS and create an external
table.

This is of course is a one off.

You can also use SRS (SAP Replication Server) to get the data out first
time and sync Hive table with Sybase IQ table real time. You will need SRS
SP 204 or above to make this work.

Talk to your DBA if they can get SRS SP from Sybase for this purpose. I
have done it many times. I think it is stable enough for this purpose.

HTH







Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 24 August 2016 at 22:35, Gopal Vijayaraghavan <gopalv@apache.org> wrote:

>
>
> > val d = HiveContext.read.format("jdbc").options(
> ...
> >> The sqoop job takes 7 hours to load 15 days of data, even while setting
> >>the direct load option to 6. Hive is using MR framework.
>
> In generaly, the jdbc implementations tend to react rather badly to large
> extracts like this - the throttling usually happens on the operational
> database end rather than being a problem on the MR side.
>
>
> Sqoop is good enough for a one-shot import, but doing it frequently is
> best done by the database's own dump protocols, which are generally not
> throttled similarly.
>
> Pinterest recently put out a document on how they do this
>
> https://engineering.pinterest.com/blog/tracker-ingesting-
> mysql-data-scale-p
> art-1
>
> +
> https://engineering.pinterest.com/blog/tracker-ingesting-
> mysql-data-scale-p
> art-2
>
> More interesting continous ingestion reads directly off the replication
> protocol write-ahead logs.
>
> https://github.com/Flipkart/MySQL-replication-listener/
> tree/master/examples
> /mysql2hdfs
>
> +
> https://github.com/flipkart-incubator/storm-mysql
>
>
> But all of these tend to be optimized to a database engine, while the JDBC
> pipe tends to work slowly for all engines.
>
> Cheers,
> Gopal
>
>
>

Mime
View raw message