kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ale...@apache.org
Subject [1/2] kudu git commit: [docs] Add "one client only" best practice for kudu-spark
Date Mon, 10 Sep 2018 20:37:39 GMT
Repository: kudu
Updated Branches:
  refs/heads/master b552d9118 -> 953a09b82


[docs] Add "one client only" best practice for kudu-spark

Change-Id: Ibaf369315b8627674ba64e6418d153568ded6fe8
Reviewed-on: http://gerrit.cloudera.org:8080/11409
Tested-by: Will Berkeley <wdberkeley@gmail.com>
Reviewed-by: Alexey Serbin <aserbin@cloudera.com>
Tested-by: Kudu Jenkins


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/e3570519
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/e3570519
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/e3570519

Branch: refs/heads/master
Commit: e3570519b200a0ffbd713798bc8aabd6f36ed3b7
Parents: b552d91
Author: Will Berkeley <wdberkeley@gmail.org>
Authored: Mon Sep 10 10:45:30 2018 -0700
Committer: Will Berkeley <wdberkeley@gmail.com>
Committed: Mon Sep 10 18:43:43 2018 +0000

----------------------------------------------------------------------
 docs/developing.adoc | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/e3570519/docs/developing.adoc
----------------------------------------------------------------------
diff --git a/docs/developing.adoc b/docs/developing.adoc
index 98db2ba..49d8c7e 100644
--- a/docs/developing.adoc
+++ b/docs/developing.adoc
@@ -217,6 +217,23 @@ mode, the submitting user must have an active Kerberos ticket granted
through
 name and keytab location must be provided through the `--principal` and
 `--keytab` arguments to `spark2-submit`.
 
+=== Spark Integration Best Practices
+
+==== Avoid multiple Kudu clients per cluster.
+
+One common Kudu-Spark coding error is instantiating extra `KuduClient` objects.
+In kudu-spark, a `KuduClient` is owned by the `KuduContext`. Spark application code
+should not create another `KuduClient` connecting to the same cluster. Instead,
+application code should use the `KuduContext` to access a `KuduClient` using
+`KuduContext#syncClient`.
+
+To diagnose multiple `KuduClient` instances in a Spark job, look for signs in
+the logs of the master being overloaded by many `GetTableLocations` or
+`GetTabletLocations` requests coming from different clients, usually around the
+same time. This symptom is especially likely in Spark Streaming code,
+where creating a `KuduClient` per task will result in periodic waves of master
+requests from new clients.
+
 === Spark Integration Known Issues and Limitations
 
 - Spark 2.2+ requires Java 8 at runtime even though Kudu Spark 2.x integration


Mime
View raw message