kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ale...@apache.org
Subject [1/2] kudu git commit: [docs] Refresh and augment the known issues
Date Wed, 03 May 2017 17:45:51 GMT
Repository: kudu
Updated Branches:
  refs/heads/branch-1.3.x c8273e169 -> 22bdf82b4


[docs] Refresh and augment the known issues

We've learned a lot about Kudu since people have started using it.
I've gathered in this patch what I think should be the new recommendations
we make to users.

Cherry-picked from master but had a conflict in developing.adoc since
master now allows renaming primary keys.

Change-Id: Iba37e3b6a0d76d46be14cef350a217dd7e71cd7f
Reviewed-on: http://gerrit.cloudera.org:8080/6786
Reviewed-by: Dan Burkert <danburkert@apache.org>
Tested-by: Kudu Jenkins


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/5487db15
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/5487db15
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/5487db15

Branch: refs/heads/branch-1.3.x
Commit: 5487db15ac96e8a7992fb403ce3aa41c2a0d231a
Parents: c8273e1
Author: Jean-Daniel Cryans <jdcryans@apache.org>
Authored: Wed Apr 19 16:50:36 2017 -0700
Committer: Jean-Daniel Cryans <jdcryans@apache.org>
Committed: Wed May 3 17:29:45 2017 +0000

----------------------------------------------------------------------
 docs/developing.adoc              |   7 +-
 docs/installation.adoc            |   1 +
 docs/known_issues.adoc            | 113 +++++++++++++++++++++++++++------
 docs/kudu_impala_integration.adoc |   5 ++
 4 files changed, 106 insertions(+), 20 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/5487db15/docs/developing.adoc
----------------------------------------------------------------------
diff --git a/docs/developing.adoc b/docs/developing.adoc
index 7cb8055..539fc4d 100644
--- a/docs/developing.adoc
+++ b/docs/developing.adoc
@@ -164,10 +164,13 @@ kuduContext.deleteTable("unwanted_table")
 - Kudu tables with a column name containing upper case or non-ascii characters
   may not be used with SparkSQL. Non-primary key columns may be renamed in Kudu
   to work around this issue.
-- `NULL`, `NOT NULL`, `<>`, `OR`, `LIKE`, and `IN` predicates are not pushed to
-  Kudu, and instead will be evaluated by the Spark task.
+- `<>` and `OR` predicates are not pushed to Kudu, and instead will be evaluated
+  by the Spark task. Only `LIKE` predicates with a suffix wildcard are pushed to
+  Kudu, meaning that `LIKE "FOO%"` is pushed down but `LIKE "FOO%BAR"` isn't.
 - Kudu does not support all types supported by Spark SQL, such as `Date`,
   `Decimal` and complex types.
+- Kudu tables may only be registered as temporary tables in SparkSQL.
+  Kudu tables may not be queried using HiveContext.
 
 
 == Kudu Python Client

http://git-wip-us.apache.org/repos/asf/kudu/blob/5487db15/docs/installation.adoc
----------------------------------------------------------------------
diff --git a/docs/installation.adoc b/docs/installation.adoc
index 8edc79f..2469c12 100644
--- a/docs/installation.adoc
+++ b/docs/installation.adoc
@@ -53,6 +53,7 @@ Linux::
       link:troubleshooting.html#req_hole_punching[troubleshooting hole punching] for more
       information.
     - ntp.
+    - xfs or ext4 formatted drives.
 macOS::
     - OS X 10.10 Yosemite, OS X 10.11 El Capitan, or macOS Sierra.
     - Prebuilt macOS packages are not provided.

http://git-wip-us.apache.org/repos/asf/kudu/blob/5487db15/docs/known_issues.adoc
----------------------------------------------------------------------
diff --git a/docs/known_issues.adoc b/docs/known_issues.adoc
index 607b4e8..f0bf4bf 100644
--- a/docs/known_issues.adoc
+++ b/docs/known_issues.adoc
@@ -27,17 +27,17 @@
 :sectlinks:
 :experimental:
 
-== Schema and Usage Limitations
-* Kudu is primarily designed for analytic use cases. You are likely to encounter issues if
-  a single row contains multiple kilobytes of data.
+== Schema
 
-* The columns which make up the primary key must be listed first in the schema.
+=== Primary keys
 
 * Columns that are part of the primary key cannot be renamed.
   The primary key may not be changed after the table is created.
   You must drop and recreate a table to select a new primary key
   or rename key columns.
 
+* The columns which make up the primary key must be listed first in the schema.
+
 * The primary key of a row may not be modified using the `UPDATE` functionality.
   To modify a row's primary key, the row must be deleted and re-inserted with
   the modified key. Such a modification is non-atomic.
@@ -46,13 +46,50 @@
   primary key definition. Additionally, all columns that are part of a primary
   key definition must be `NOT NULL`.
 
-* Type and nullability of existing columns cannot be changed by altering the table.
+* Auto-generated primary keys are not supported.
+
+* Cells making up a composite primary key are limited to a total of 16KB after the internal
+  composite-key encoding done by Kudu.
+
+=== Columns
+
+* TIMESTAMP, DECIMAL, CHAR, VARCHAR, DATE, and complex types like ARRAY are not supported.
+
+* Type, nullability, compression, and encoding of existing columns cannot be changed by altering
the table.
+
+* Tables can have a maximum of 300 columns.
+
+=== Tables
+
+* Tables must have an odd number of replicas, with a maximum of 7.
+
+* Replication factor (set at table creation time) cannot be changed.
+
+=== Cells (individual values)
+
+* Cells cannot be larger than 64KB.
+
+=== Other usage limitations
+
+* Kudu is primarily designed for analytic use cases. You are likely to encounter issues if
+  a single row contains multiple kilobytes of data.
+
+* Secondary indexes are not supported.
+
+* Multi-row transactions are not supported.
+
+* Relational features, like foreign keys, are not supported.
+
+* Identifiers such as column and table names are restricted to be valid UTF-8 strings.
+  Additionally, a maximum length of 256 characters is enforced.
 
 * Dropping a column does not immediately reclaim space. Compaction must run first.
-There is no way to run compaction manually, but dropping the table will reclaim the
-space immediately.
+
+* There is no way to run compaction manually, but dropping the table will reclaim the
+  space immediately.
 
 == Partitioning Limitations
+
 * Tables must be manually pre-split into tablets using simple or compound primary
   keys. Automatic splitting is not yet possible. Range partitions may be added
   or dropped after a table has been created. See
@@ -62,21 +99,61 @@ space immediately.
   create a new table with the new partitioning and insert the contents of the old
   table.
 
-== Replication and Backup Limitations
-* Kudu does not currently include any built-in features for backup and restore.
-  Users are encouraged to use tools such as Spark or Impala to export or import
-  tables as necessary.
+* Tablets that lose a majority of replicas (such as 1 left out of 3) require manual
+  intervention to be repaired.
+
+== Cluster management
+
+* Rack awareness is not supported.
+
+* Multi-datacenter is not supported.
+
+* Rolling restart is not supported.
+
+== Server management
+
+* Production deployments should configure a least 4GB of memory for tablet servers,
+  and ideally more than 10GB.
+
+* Write ahead logs (WAL) can only be stored on one disk.
+
+* Disk failures are not tolerated and tablets servers will crash as soon as one is detected.
+
+* Failed disks with unrecoverable data require the formatting of all the Kudu data for
+  that tablet server before it can be started again.
+
+* Data directories cannot be added/removed; all must be reformatted to change the set
+  of directories.
 
-== Impala Limitations
+* Tablet servers cannot be gracefully decommissioned.
 
-* Updates, inserts, and deletes via Impala are non-transactional. If a query
-  fails part of the way through, its partial effects will not be rolled back.
+* Tablet servers can’t change address/port.
 
-* No timestamp and decimal type support.
+* Kudu has a hard requirement on having up-to-date NTP. Kudu masters and tablet servers
+  will crash when out of sync.
 
-* The maximum parallelism of a single query is limited to the number of tablets
-  in a table. For good analytic performance, aim for 10 or more tablets per host
-  or use large tables.
+* Kudu releases are only tested with NTP. Other time synchronization providers like Chrony
+  may or may not work.
+
+== Scale
+
+* Recommended maximum number of tablet servers is 100.
+
+* Recommended maximum number of masters is 3.
+
+* Recommended maximum amount of stored data, post-replication and post-compression,
+  per tablet server is 4TB.
+
+* Recommended maximum number of tablets per tablet server is 1000, post-replication.
+
+* Maximum number of tablets per table for each tablet server is 60, post-replication,
+  at table-creation time.
+
+== Replication and Backup Limitations
+
+* Kudu does not currently include any built-in features for backup and restore.
+  Users are encouraged to use tools such as Spark or Impala to export or import
+  tables as necessary.
 
 == Security Limitations
 

http://git-wip-us.apache.org/repos/asf/kudu/blob/5487db15/docs/kudu_impala_integration.adoc
----------------------------------------------------------------------
diff --git a/docs/kudu_impala_integration.adoc b/docs/kudu_impala_integration.adoc
index 9383124..a783b26 100755
--- a/docs/kudu_impala_integration.adoc
+++ b/docs/kudu_impala_integration.adoc
@@ -743,3 +743,8 @@ The examples above have only explored a fraction of what you can do with
Impala
 - `NULL`, `NOT NULL`, `!=`, and `LIKE` predicates are not pushed to Kudu, and
   instead will be evaluated by the Impala scan node. This may decrease performance
   relative to other types of predicates.
+- Updates, inserts, and deletes via Impala are non-transactional. If a query
+  fails part of the way through, its partial effects will not be rolled back.
+- The maximum parallelism of a single query is limited to the number of tablets
+  in a table. For good analytic performance, aim for 10 or more tablets per host
+  or use large tables.


Mime
View raw message