orc-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From omal...@apache.org
Subject orc git commit: Deploying site with Dain's documentation updates.
Date Mon, 19 Jun 2017 22:33:33 GMT
Repository: orc
Updated Branches:
  refs/heads/asf-site 28d825c2c -> cce469c77

Deploying site with Dain's documentation updates.

Signed-off-by: Owen O'Malley <omalley@apache.org>

Project: http://git-wip-us.apache.org/repos/asf/orc/repo
Commit: http://git-wip-us.apache.org/repos/asf/orc/commit/cce469c7
Tree: http://git-wip-us.apache.org/repos/asf/orc/tree/cce469c7
Diff: http://git-wip-us.apache.org/repos/asf/orc/diff/cce469c7

Branch: refs/heads/asf-site
Commit: cce469c77b64510795617ea31c8ce753b402e3a8
Parents: 28d825c
Author: Owen O'Malley <omalley@apache.org>
Authored: Mon Jun 19 15:32:48 2017 -0700
Committer: Owen O'Malley <omalley@apache.org>
Committed: Mon Jun 19 15:32:48 2017 -0700

 docs/compression.html | 9 +++++----
 docs/encodings.html   | 9 ++++++---
 docs/file-tail.html   | 2 +-
 3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/docs/compression.html b/docs/compression.html
index 94aeefe..e96d5b1 100644
--- a/docs/compression.html
+++ b/docs/compression.html
@@ -1080,10 +1080,11 @@ start decompressing without the previous bytes.</p>
 <p><img src="/img/CompressionStream.png" alt="compression streams" /></p>
 <p>The default compression chunk size is 256K, but writers can choose
-their own value less than 223. Larger chunks lead to better
-compression, but require more memory. The chunk size is recorded in
-the Postscript so that readers can allocate appropriately sized
+their own value. Larger chunks lead to better compression, but require
+more memory. The chunk size is recorded in the Postscript so that
+readers can allocate appropriately sized buffers. Readers are
+guaranteed that no chunk will expand to more than the compression chunk
 <p>ORC files without generic compression write each stream directly
 with no headers.</p>

diff --git a/docs/encodings.html b/docs/encodings.html
index b9fa2b0..bcc663a 100644
--- a/docs/encodings.html
+++ b/docs/encodings.html
@@ -1139,9 +1139,12 @@ bytes.</p>
 <h2 id="string-char-and-varchar-columns">String, Char, and VarChar Columns</h2>
-<p>String columns are adaptively encoded based on whether the first
-10,000 values are sufficiently distinct. In all of the encodings, the
-PRESENT stream encodes whether the value is null.</p>
+<p>String, char, and varchar columns may be encoded either using a
+dictionary encoding or a direct encoding. A direct encoding should be
+preferred when there are many distinct values. In all of the
+encodings, the PRESENT stream encodes whether the value is null. The
+Java ORC writer automatically picks the encoding after the first row
+group (10,000 rows).</p>
 <p>For direct encoding the UTF-8 bytes are saved in the DATA stream and
 the length of each value is written into the LENGTH stream. In direct

diff --git a/docs/file-tail.html b/docs/file-tail.html
index b4cf021..2fc4461 100644
--- a/docs/file-tail.html
+++ b/docs/file-tail.html
@@ -1230,7 +1230,7 @@ that contains the list of their children’s type ids.</p>
  repeated uint32 subtypes = 2 [packed=true];
  // the list of field names for struct
  repeated string fieldNames = 3;
- // the maximum length of the type for varchar or char
+ // the maximum length of the type for varchar or char in UTF-8 characters
  optional uint32 maximumLength = 4;
  // the precision and scale for decimal
  optional uint32 precision = 5;

View raw message