kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "冯宝利" <fengba...@uce.cn>
Subject 回复:Use spark SQL to query kudu issues!
Date Thu, 11 Jul 2019 03:29:03 GMT
Thanks for your help.I checked my java client version:java version "1.8.0_141".
Looking at the scanner and rpcz interfaces will notice that we are ready to adjust the scanner_ttl_ms
parameters to see how things are going.

Thanks!


------------------------------------------------------------------
发件人:Andrew Wong <awong@cloudera.com>
发送时间:2019年7月11日(星期四) 11:19
收件人:user <user@kudu.apache.org>; 冯宝利 <fengbaoli@uce.cn>
主 题:Re: Use spark SQL to query kudu issues!

Can you check your <tserver-host>:8050/scans page whether the scanner exists? Or the
next time you see this error, see if it's there.

Additionally, can you verify that you're using the Kudu 1.8 Java client? My understanding
is that Kudu+Spark should automatically schedule keepAlive calls for the scanners in the Java
client starting in 1.8.0, per this patch: https://github.com/apache/kudu/commit/cf1b1f42cbcc3ee67477ddc44cd0ff5070f1caac.

You may be able to confirm this by looking at <tserver-host>:8050/rpcz to see if there
are any KeepAlive requests.

If all else fails, you can use the --scanner_ttl_ms tablet server configuration to allow scanners
to stay alive for longer.
On Sun, Jul 7, 2019 at 7:23 PM 冯宝利 <fengbaoli@uce.cn> wrote:
Hi:

   We use spark sql to analyze kudu data, use apache oozie as a scheduling tool to run a job
every day, normal operation under normal circumstances, but sometimes problems occur, resulting
in the task neither error nor success, has been running, resulting in This timed task has
been delayed, and this kind of problem is probabilistic. This problem has not been solved,
which makes it impossible to get stability service from kudu. Only kudu can be used to run
some lower priority reports. The important report still uses apache hive.
 (1) The error message obtained from spark sql is as follows:


(2)The code to load kudu using spark is as follows:
 class kudu_Utils {
    def get_tables(database: String, tablename: String) = {
      val spark = SparkSession.builder
        .appName("get kudu")
        //        .master("local[*]")
        .config("spark.sql.warehouse.dir", "spark-warehouse")
        .getOrCreate()

      val table: String = database.concat(tablename)
      val kuduOptions = Map("kudu.master" -> "hadoop1:7051,hadoop2:7051,hadoop3:7051",
        "kudu.table" -> table,
        "kudu.operation.timeout.ms" -> "10000")
      val tableDF = spark.read.options(kuduOptions).kudu
      tableDF
    }
  }
(3)Our component version information is as follows:
(1)spark: 2.4.0
(2) kudu: 1.8.0


Please give us some suggestions or help with the use. 
Thanks!

-- 
Andrew Wong

Mime
View raw message