kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Heo <jason.heo....@gmail.com>
Subject Some bulk requests are missing when a tserver stopped
Date Sat, 22 Apr 2017 13:48:27 GMT
Hi.

I'm using Apache Kudu 1.2. I'm currently testing high availability of Kudu.

During bulk loading, one tserver is stopped via CDH Manager intentionally
and 2% of rows are missing.

I use Spark 1.6 and package org.apache.kudu:kudu-spark_2.10:1.1.0 for bulk
loading.

I got a error several times during insertion. Although 2% is lost when
tserver is stop and not started again, If I start it right after stopped,
there was no loss even though I got same error messages.


I watched Comcast's recent presentation at Strata Hadoop, They said that


Spark is recommended for large inserts to ensure handling failures
>
>
I'm curious Comcast has no issues with tserver failures and how can I
prevent rows from being lost.

----------------------------------

Below is an spark error message. ("01d....b64" is the killed one.)


java.lang.RuntimeException: failed to write 2 rows from DataFrame to Kudu;
sample errors: Timed out: RPC can not complete before timeout:
Batch{operations=2, tablet='1e83668a9fa44883897474eaa20a7cad'
[0x00000001323031362D3036, 0x00000001323031362D3037),
ignoreAllDuplicateRows=false, rpc=KuduRpc(method=Write, tablet=
1e83668a9fa44883897474eaa20a7cad, attempt=25,
DeadlineTracker(timeout=30000, elapsed=29298), Traces: [0ms] sending RPC to
server 01d513bc5c1847c29dd89c3d21a1eb64, [589ms] received from server
01d513bc5c1847c29dd89c3d21a1eb64 response Network error: [Peer
01d513bc5c1847c29dd89c3d21a1eb64] Connection reset, [589ms] delaying RPC
due to Network error: [Peer 01d513bc5c1847c29dd89c3d21a1eb64] Connection
reset, [597ms] querying master, [597ms] Sub rpc: GetTableLocations sending
RPC to server 50cb634c24ef426c9147cc4b7181ca11, [599ms] Sub rpc:
GetTableLocations sending RPC to server 50cb634c24ef426c9147cc4b7181ca11,
[643ms
...
...
received from server 01d513bc5c1847c29dd89c3d21a1eb64 response Network
error: [Peer 01d513bc5c1847c29dd89c3d21a1eb64] Connection reset, [29357ms]
delaying RPC due to Network error: [Peer 01d513bc5c1847c29dd89c3d21a1eb64]
Connection reset)}
at org.apache.kudu.spark.kudu.KuduContext$$anonfun$
writeRows$1.apply(KuduContext.scala:184)
at org.apache.kudu.spark.kudu.KuduContext$$anonfun$
writeRows$1.apply(KuduContext.scala:179)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$
anonfun$apply$33.apply(RDD.scala:920)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$
anonfun$apply$33.apply(RDD.scala:920)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(
SparkContext.scala:1869)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(
SparkContext.scala:1869)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
------------------

Mime
View raw message