kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "冯宝利" <fengba...@uce.cn>
Subject Use spark SQL to query kudu issues!
Date Mon, 08 Jul 2019 02:23:02 GMT
Hi:

   We use spark sql to analyze kudu data, use apache oozie as a scheduling tool to run a job
every day, normal operation under normal circumstances, but sometimes problems occur, resulting
in the task neither error nor success, has been running, resulting in This timed task has
been delayed, and this kind of problem is probabilistic. This problem has not been solved,
which makes it impossible to get stability service from kudu. Only kudu can be used to run
some lower priority reports. The important report still uses apache hive.
 (1) The error message obtained from spark sql is as follows:


(2)The code to load kudu using spark is as follows:
 class kudu_Utils {
    def get_tables(database: String, tablename: String) = {
      val spark = SparkSession.builder
        .appName("get kudu")
        //        .master("local[*]")
        .config("spark.sql.warehouse.dir", "spark-warehouse")
        .getOrCreate()

      val table: String = database.concat(tablename)
      val kuduOptions = Map("kudu.master" -> "hadoop1:7051,hadoop2:7051,hadoop3:7051",
        "kudu.table" -> table,
        "kudu.operation.timeout.ms" -> "10000")
      val tableDF = spark.read.options(kuduOptions).kudu
      tableDF
    }
  }
(3)Our component version information is as follows:
(1)spark: 2.4.0
(2) kudu: 1.8.0


Please give us some suggestions or help with the use. 
Thanks!
Mime
View raw message