kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Wong <aw...@cloudera.com>
Subject Re: Use spark SQL to query kudu issues!
Date Thu, 11 Jul 2019 03:18:58 GMT
Can you check your <tserver-host>:8050/scans page whether the scanner
exists? Or the next time you see this error, see if it's there.

Additionally, can you verify that you're using the Kudu 1.8 Java client? My
understanding is that Kudu+Spark should automatically schedule keepAlive
calls for the scanners in the Java client starting in 1.8.0, per this
patch:
https://github.com/apache/kudu/commit/cf1b1f42cbcc3ee67477ddc44cd0ff5070f1caac
.

You may be able to confirm this by looking at <tserver-host>:8050/rpcz to
see if there are any KeepAlive requests.

If all else fails, you can use the --scanner_ttl_ms tablet server
configuration to allow scanners to stay alive for longer.

On Sun, Jul 7, 2019 at 7:23 PM 冯宝利 <fengbaoli@uce.cn> wrote:

> Hi:
>
>
>  We use spark sql to analyze kudu data, use apache oozie as a scheduling tool to run
a job every day, normal operation under normal circumstances, but sometimes problems occur,
resulting in the task neither error nor success, has been running, resulting in This timed
task has been delayed, and this kind of problem is probabilistic. This problem has not been
solved, which makes it impossible to get stability service from kudu. Only kudu can be used
to run some lower priority reports. The important report still uses apache hive.
>
>  (1) The error message obtained from spark sql is as follows:
>
> (2)The code to load kudu using spark is as follows:
>
>  class kudu_Utils {
>     def get_tables(database: String, tablename: String) = {
>       val spark = SparkSession.builder
>         .appName("get kudu")
>         //        .master("local[*]")
>         .config("spark.sql.warehouse.dir", "spark-warehouse")
>         .getOrCreate()
>
>       val table: String = database.concat(tablename)
>
>       val kuduOptions = Map("kudu.master" -> "hadoop1:7051,hadoop2:7051,hadoop3:7051",
>         "kudu.table" -> table,
>         "kudu.operation.timeout.ms" -> "10000")
>       val tableDF = spark.read.options(kuduOptions).kudu
>       tableDF
>     }
>   }
> (3)Our component version information is as follows:
> (1)spark: 2.4.0
> (2) kudu: 1.8.0
>
>
> Please give us some suggestions or help with the use.
> Thanks!
>


-- 
Andrew Wong

Mime
View raw message