kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From t...@apache.org
Subject [2/2] kudu git commit: Initial draft of release notes and doc updates for 1.2
Date Thu, 12 Jan 2017 20:39:26 GMT
Initial draft of release notes and doc updates for 1.2

Change-Id: I08326171dd2bf6097a7594b95adca946bb5922eb
Reviewed-on: http://gerrit.cloudera.org:8080/5604
Tested-by: Kudu Jenkins
Reviewed-by: Jean-Daniel Cryans <jdcryans@apache.org>
(cherry picked from commit ccb34a7eaed7d9a01e8a3908ad9a089e4101eaac)
Reviewed-on: http://gerrit.cloudera.org:8080/5698
Reviewed-by: Todd Lipcon <todd@apache.org>
Tested-by: Todd Lipcon <todd@apache.org>

Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/a23e9d64
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/a23e9d64
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/a23e9d64

Branch: refs/heads/branch-1.2.x
Commit: a23e9d6458fa1384c978cfad4db2eea5a90f0c40
Parents: e89cac8
Author: Todd Lipcon <todd@apache.org>
Authored: Wed Jan 4 17:24:32 2017 -0800
Committer: Todd Lipcon <todd@apache.org>
Committed: Thu Jan 12 20:38:58 2017 +0000

 docs/known_issues.adoc  |  14 ++--
 docs/release_notes.adoc | 182 ++++++++++++++++++++++++++++++++++++++++++-
 docs/schema_design.adoc |   8 +-
 3 files changed, 191 insertions(+), 13 deletions(-)

diff --git a/docs/known_issues.adoc b/docs/known_issues.adoc
index edb8afb..abb9929 100644
--- a/docs/known_issues.adoc
+++ b/docs/known_issues.adoc
@@ -33,17 +33,21 @@
 * The columns which make up the primary key must be listed first in the schema.
-* Key columns cannot be altered. You must drop and recreate a table to change its keys.
+* Columns that are part of the primary key cannot be renamed.
+  The primary key may not be changed after the table is created.
+  You must drop and recreate a table to select a new primary key
+  or rename key columns.
-* Key columns must not be null.
+* The primary key of a row may not be modified using the `UPDATE` functionality.
+  To modify a row's primary key, the row must be deleted and re-inserted with
+  the modified key. Such a modification is non-atomic.
 * Columns with `DOUBLE`, `FLOAT`, or `BOOL` types are not allowed as part of a
-  primary key definition.
+  primary key definition. Additionally, all columns that are part of a primary
+  key definition must be `NOT NULL`.
 * Type and nullability of existing columns cannot be changed by altering the table.
-* A table’s primary key cannot be changed.
 * Dropping a column does not immediately reclaim space. Compaction must run first.
 There is no way to run compaction manually, but dropping the table will reclaim the
 space immediately.

diff --git a/docs/release_notes.adoc b/docs/release_notes.adoc
index 7ef408f..fbac805 100644
--- a/docs/release_notes.adoc
+++ b/docs/release_notes.adoc
@@ -33,14 +33,148 @@
 == New features
+* Kudu clients and servers now redact user data such as cell values
+  from log messages, Java exception messages, and `Status` strings.
+  User metadata such as table names, column names, and partition
+  bounds are not redacted.
+Redaction is enabled by default, but may be disabled by setting the new
+`log_redact_user_data` flag to `false`.
+// TODO(danburkert): this flag is marked experimental, should we not doc it?
+* Kudu's ability to provide consistency guarantees has been substantially
+** Replicas now correctly track their "safe timestamp". This timestamp
+   is the maximum timestamp at which reads are guaranteed to be
+   repeatable.
+** A scan created using the `SCAN_AT_SNAPSHOT` mode will now
+   either wait for the requested snapshot to be "safe" at the replica
+   being scanned, or be re-routed to a replica where the requested
+   snapshot is "safe". This ensures that all such scans are repeatable.
+** Kudu Tablet Servers now properly retain historical data when a row
+   with a given primary key is inserted and deleted, followed by the
+   insertion of a new row with the same key. Previous versions of Kudu
+   would not retain history in such situations. This allows the server
+   to return correct results for snapshot scans with a timestamp in the
+   past, even in the presence of such "reinsertion" scenarios.
+** The Kudu clients now automatically retain the timestamp of their latest
+   successful read or write operation. Scans using the `READ_AT_SNAPSHOT` mode
+   without a client-provided timestamp automatically assign a timestamp
+   higher than the timestamp of their most recent write. Writes also propagate
+   the timestamp, ensuring that sequences of operations with causal dependencies
+   between them are assigned increasing timestamps. Together, these changes
+   allow clients to achieve read-your-writes consistency, and also ensure
+   that snapshot scans performed by other clients return causally-consistent
+   results.
+* Kudu servers now automatically limit the number of log files.
+  The number of log files retained can be configured using the
+  `max_log_files` flag. By default, 10 log files will be retained
+  at each severity level.
+// TODO(danburkert): this new flag is marked experimental, should we make it
+// stable or evolving? Or should we not document that it's configurable?
 == Optimizations and improvements
+* The logging in the Java and {cpp} clients has been substantially quieted.
+  Clients no longer log messages in normal operation unless there
+  is some kind of error.
+* The {cpp} client now includes a `KuduSession::SetErrorBufferSpace`
+  API which can limit the amount of memory used to buffer
+  errors from asynchronous operations.
+* The Java client now fetches tablet locations from the Kudu Master
+  in batches of 1000, increased from batches of 10 in prior versions.
+  This can substantially improve the performance of Spark and Impala
+  queries running against Kudu tables with large numbers of tablets.
+* Table metadata lock contention in the Kudu Master was substantially
+  reduced. This improves the performance of tablet location lookups on
+  large clusters with a high degree of concurrency.
+* Lock contention in the Kudu Tablet Server during high-concurrency
+  write workloads was also reduced. This can reduce CPU consumption and
+  improve performance when a large number of concurrent clients are writing
+  to a smaller number of a servers.
+* Lock contention when writing log messages has been substantially reduced.
+  This source of contention could cause high tail latencies on requests,
+  and when under high load could contribute to cluster instability
+  such as election storms and request timeouts.
+* The `BITSHUFFLE` column encoding has been optimized to use the `AVX2`
+  instruction set present on processors including Intel(R) Sandy Bridge
+  and later. Scans on `BITSHUFFLE`-encoded columns are now up to 30% faster.
+* The `kudu` tool now accepts hyphens as an alternative to underscores
+  when specifying actions. For example, `kudu local-replica copy-from-remote`
+  may be used as an alternative to `kudu local_replica copy_from_remote`.
+== Fixed Issues
+* link:https://issues.apache.org/jira/browse/KUDU-1508[KUDU-1508]
+  Fixed a long-standing issue in which running Kudu on `ext4` file systems
+  could cause file system corruption.
+* link:https://issues.apache.org/jira/browse/KUDU-1399[KUDU-1399]
+  Implemented an LRU cache for open files, which prevents running out of
+  file descriptors on long-lived Kudu clusters. By default, Kudu will
+  limit its file descriptor usage to half of its configured `ulimit`.
+* link:http://gerrit.cloudera.org:8080/5192[Gerrit #5192]
+  Fixed an issue which caused data corruption and crashes in the case that
+  a table had a non-composite (single-column) primary key, and that column
+  was specified to use `DICT_ENCODING` or `BITSHUFFLE` encodings. If a
+  table with an affected schema was written in previous versions of Kudu,
+  the corruption will not be automatically repaired; users are encouraged
+  to re-insert such tables after upgrading to Kudu 1.2 or later.
-=== Command line tools
+* link:http://gerrit.cloudera.org:8080/5541[Gerrit #5541]
+  Fixed a bug in the Spark `KuduRDD` implementation which could cause
+  rows in the result set to be silently skipped in some cases.
+* link:https://issues.apache.org/jira/browse/KUDU-1551[KUDU-1551]
+  Fixed an issue in which the tablet server would crash on restart in the
+  case that it had previously crashed during the process of allocating
+  a new WAL segment.
-== Wire protocol compatibility
+* link:https://issues.apache.org/jira/browse/KUDU-1764[KUDU-1764]
+  Fixed an issue where Kudu servers would leak approximately 16-32MB of disk
+  space for every 10GB of data written to disk. After upgrading to Kudu
+  1.2 or later, any disk space leaked in previous versions will be
+  automatically recovered on startup.
+* link:https://issues.apache.org/jira/browse/KUDU-1750[KUDU-1750]
+  Fixed an issue where the API to drop a range partition would drop any
+  partition with a matching lower _or_ upper bound, rather than any partition
+  with matching lower _and_ upper bound.
+* link:https://issues.apache.org/jira/browse/KUDU-1766[KUDU-1766]
+  Fixed an issue in the Java client where equality predicates which compared
+  an integer column to its maximum possible value (e.g. `Integer.MAX_VALUE`)
+  would return incorrect results.
+* link:https://issues.apache.org/jira/browse/KUDU-1780[KUDU-1780]
+  Fixed the `kudu-client` Java artifact to properly shade classes in the
+  `com.google.thirdparty` namespace. The lack of proper shading in prior
+  releases could cause conflicts with certain versions of Google Guava.
+* link:http://gerrit.cloudera.org:8080/5327[Gerrit #5327]
+  Fixed shading issues in the `kudu-flume-sink` Java artifact. The sink
+  now expects that Hadoop dependencies are provided by Flume, and properly
+  shades the Kudu client's dependencies.
+* Fixed a few issues using the Python client library from Python 3.
+== Wire Protocol compatibility
 Kudu 1.2.0 is wire-compatible with previous versions of Kudu:
@@ -52,9 +186,49 @@ Kudu 1.2.0 is wire-compatible with previous versions of Kudu:
   in the cluster, upgrade the software, and then restart the daemons on the new version.
-== Incompatible changes in Kudu 1.2.0
+== Incompatible Changes in Kudu 1.2.0
+* The replication factor of tables is now limited to a maximum of 7. In addition,
+  it is no longer allowed to create a table with an even replication factor.
+* The `GROUP_VARINT` encoding is now deprecated. Kudu servers have never supported
+  this encoding, and now the client-side constant has been deprecated to match the
+  server's capabilities.
+=== New Restrictions on Data, Schemas, and Identifiers
+Kudu 1.2.0 introduces several new restrictions on schemas, cell size, and identifiers:
+Number of Columns:: By default, Kudu will not permit the creation of tables with
+more than 300 columns. We recommend schema designs that use fewer columns for best
+Size of Cells:: No individual cell may be larger than 64KB. The cells making up a
+a composite key are limited to a total of 16KB after the internal composite-key encoding
+done by Kudu. Inserting rows not conforming to these limitations will result in errors
+being returned to the client.
+Valid Identifiers:: Identifiers such as column and table names are now restricted to
+be valid UTF-8 strings. Additionally, a maximum length of 256 characters is enforced.
+=== Client Library Compatibility
+* The Kudu 1.2 Java client is API- and ABI-compatible with Kudu 1.1. Applications
+  written against Kudu 1.1 will compile and run against the Kudu 1.2 client and
+  vice-versa.
+* The Kudu 1.2 {cpp} client is API- and ABI-forward-compatible with Kudu 1.1.
+  Applications written and compiled against the Kudu 1.1 client will run without
+  modification against the Kudu 1.2 client. Applications written and compiled
+  against the Kudu 1.2 client will run without modification against the Kudu 1.1
+  client unless they use one of the following new APIs:
+** `kudu::DisableSaslInitialization()`
+** `KuduSession::SetErrorBufferSpace(...)`
-=== Client APIs ({cpp}/Java/Python)
+* The Kudu 1.2 Python client is API-compatible with Kudu 1.1. Applications
+  written against Kudu 1.1 will continue to run against the Kudu 1.2 client
+  and vice-versa.

diff --git a/docs/schema_design.adoc b/docs/schema_design.adoc
index 7c991f3..0c737f7 100644
--- a/docs/schema_design.adoc
+++ b/docs/schema_design.adoc
@@ -442,10 +442,7 @@ support renaming primary key columns.
 == Known Limitations
-Kudu currently has some known limitations that may factor into schema design. When
-designing your schema, consider these limitations together, not in isolation. If you
-test these limitations and your findings are different from these, please share your
-test cases and results.
+Kudu currently has some known limitations that may factor into schema design.
 Number of Columns:: By default, Kudu will not permit the creation of tables with
 more than 300 columns. We recommend schema designs that use fewer columns for best
@@ -459,6 +456,9 @@ being returned to the client.
 Size of Rows:: Although individual cells may be up to 64KB, and Kudu supports up to
 300 columns, it is recommended that no single row be larger than a few hundred KB.
+Valid Identifiers:: Identifiers such as table and column names must be valid UTF-8
+sequences and no longer than 256 bytes.
 Immutable Primary Keys:: Kudu does not allow you to update the primary key
 columns of a row.

View raw message