hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Vijayaraghavan <>
Subject Re: Loading Sybase to hive using sqoop
Date Wed, 24 Aug 2016 21:35:04 GMT

> val d ="jdbc").options(
>> The sqoop job takes 7 hours to load 15 days of data, even while setting
>>the direct load option to 6. Hive is using MR framework.

In generaly, the jdbc implementations tend to react rather badly to large
extracts like this - the throttling usually happens on the operational
database end rather than being a problem on the MR side.

Sqoop is good enough for a one-shot import, but doing it frequently is
best done by the database's own dump protocols, which are generally not
throttled similarly.

Pinterest recently put out a document on how they do this


More interesting continous ingestion reads directly off the replication
protocol write-ahead logs.


But all of these tend to be optimized to a database engine, while the JDBC
pipe tends to work slowly for all engines.


View raw message