sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Abhijeet Gaikwad (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SQOOP-604) Easy throttling feature for MySQL exports
Date Sat, 03 Nov 2012 05:26:12 GMT

     [ https://issues.apache.org/jira/browse/SQOOP-604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Abhijeet Gaikwad updated SQOOP-604:

    Affects Version/s:     (was: 1.4.3)
> Easy throttling feature for MySQL exports
> -----------------------------------------
>                 Key: SQOOP-604
>                 URL: https://issues.apache.org/jira/browse/SQOOP-604
>             Project: Sqoop
>          Issue Type: Improvement
>          Components: connectors/mysql
>    Affects Versions: 1.4.2
>            Reporter: Zoltan Toth-Czifra
>            Priority: Minor
>             Fix For: 1.4.3
>         Attachments: SQOOP-604_v6.patch
> Sqoop always tries to achieve the best possible throughput with exports, which might
not be desirable in all cases. Sometimes we need to export large data with Sqoop to a live
relational database (MySQL in our case), that is, a database that is under a high load serving
random queries from the users of our product.
> While data consistency issues during the export can be easily solved with a staging table,
there is still a problem: the performance impact caused by the heavy export. 
> First off, the resources of MySQL dedicated to the import process can affect the performance
of the live product, both on the master and on the slaves. Second, even if the servers can
handle the import with no significant performance impact (mysqlimport should be relatively
"cheap"), importing big tables (GB+) can cause serious replication lag in the cluster risking
data consistency.
> My suggestion is quite simple. Using the already existing "checkpoint" feature of the
MySQL exports (the export process is restarted every X bytes written), extending it with a
new config value that would simply make the thread sleep for X milliseconds at the checkbpoints.
With low enough byte count limit this can be a simple yet powerful throttling mechanism.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message