impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tianyi Wang (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5636: Change the metadata in parquet
Date Fri, 28 Jul 2017 17:07:49 GMT
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7514

to look at the new patch set (#5).

Change subject: IMPALA-5636: Change the metadata in parquet
......................................................................

IMPALA-5636: Change the metadata in parquet

When writing in parquet format, Impala does not use repetition level.
But the repetition level encoding is set to BIT_PACKED, which is deprecated
and may cause problems when read by other softwares.
Changing it to RLE solves this issue.

Testing: This change is only manually tested.
To test with default testdata loaded:
> create table default.test like tpch_parquet.orders stored as parquet;
> insert into default.random values (0,0,"",0,"","","",0,"");
Then fetch "hdfs://localhost:20500/test-warehouse/test/*.parq" and use
$ java -jar parquet-tools-1.6.0.jar dump /home/tianyi/Downloads/*.parq | grep RLE:
to inspect the file. Before the change you would see output like
    page 0:              DLE:RLE RLE:BIT_PACKED VLE:PLA [more]... VC:1
and after the change they should be
    page 0:              DLE:RLE RLE:RLE VLE:PLA [more]... VC:1

Change-Id: I4112ec88e8f4050d28661d27f9227450288a6756
---
M be/src/exec/hdfs-parquet-table-writer.cc
1 file changed, 1 insertion(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/7514/5
-- 
To view, visit http://gerrit.cloudera.org:8080/7514
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4112ec88e8f4050d28661d27f9227450288a6756
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <twang@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <twang@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>

Mime
View raw message