impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Tauber-Marshall (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA-3376: Extra definition level when writing Parquet files
Date Mon, 18 Jul 2016 17:50:37 GMT
Thomas Tauber-Marshall has uploaded a new patch set (#4).

Change subject: IMPALA-3376: Extra definition level when writing Parquet files
......................................................................

IMPALA-3376: Extra definition level when writing Parquet files

Currently, when writing a new value to a parquet file, we write
the definition level before checking if there's enough space on
the current page for the value. If there isn't, we create a new
page and rewrite the definition level to it, but this leaves the
definition level for that value still written to the old page.

To fix this, we should make sure that we have enough space to write
both the definition level and the value before writing either.

This patch also modifies the parquet-reader tool, which reads
parquet files and performs minimal sanity checking on their
metadata, to check for extra definition levels, and adds a test
that runs the tool automatically.

Change-Id: I2cafd7ef6b607ce6f815072b8af7395a892704d9
---
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/util/parquet-reader.cc
M be/src/util/rle-encoding.h
M tests/common/skip.py
A tests/query_test/test_writers.py
5 files changed, 167 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/56/3556/4
-- 
To view, visit http://gerrit.cloudera.org:8080/3556
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2cafd7ef6b607ce6f815072b8af7395a892704d9
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>

Mime
View raw message