airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Paris <>
Subject Re: Help SparkJDBCOperator
Date Sat, 09 Feb 2019 11:16:00 GMT

Be careful with sparkJdbc as a replacement of Sqoop for large tables.
Sqoop is able to handle any source table size while sparkJdbc design does not.
While it provides a way to distribute in multiple partitions, spark is
limited by the executors memory where sqoop is limited by the hdfs

As a result, I have written a spark library (for postgres only right
now) witch overcome the core spark jdbc limitations. It handles any
workload, and my tests show it was 8 times faster than sqoop. I have not
tested it with airflow, but it is compatible with apache livy and

On Fri, Feb 01, 2019 at 01:53:57PM +0100, Iván Robla Albarrán wrote:
> Hi ,
> I am seaching how to substitute Apache Sqoop
> I am analyzing SparkJDBCOperator, but i dont understand how i have to use .
> It a version of  SparkSubmit operator, for include as conection JDBC
> conection ?
>  I need to include Spark code?
> Any example?
> Thanks, I am very lost
> Regards,
> Iván Robla


View raw message