impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5636: Change the metadata in parquet
Date Mon, 31 Jul 2017 17:03:01 GMT
Tim Armstrong has submitted this change and it was merged.

Change subject: IMPALA-5636: Change the metadata in parquet
......................................................................


IMPALA-5636: Change the metadata in parquet

When writing in parquet format, Impala does not use repetition level.
But the repetition level encoding is set to BIT_PACKED, which is deprecated
and may cause problems when read by other softwares.
Changing it to RLE solves this issue.

Testing: This change is only manually tested.
To test with default testdata loaded:
> create table default.test like tpch_parquet.orders stored as parquet;
> insert into default.random values (0,0,"",0,"","","",0,"");
Then fetch "hdfs://localhost:20500/test-warehouse/test/*.parq" and use
$ java -jar parquet-tools-1.6.0.jar dump /home/tianyi/Downloads/*.parq | grep RLE:
to inspect the file. Before the change you would see output like
    page 0:              DLE:RLE RLE:BIT_PACKED VLE:PLA [more]... VC:1
and after the change they should be
    page 0:              DLE:RLE RLE:RLE VLE:PLA [more]... VC:1

Change-Id: I4112ec88e8f4050d28661d27f9227450288a6756
Reviewed-on: http://gerrit.cloudera.org:8080/7514
Tested-by: Impala Public Jenkins
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
---
M be/src/exec/hdfs-parquet-table-writer.cc
1 file changed, 1 insertion(+), 2 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I4112ec88e8f4050d28661d27f9227450288a6756
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <twang@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <twang@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>

Mime
View raw message