impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness
Date Mon, 30 Oct 2017 21:35:39 GMT
Tim Armstrong has posted comments on this change. ( )

Change subject: IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness

Patch Set 1:

File docs/topics/impala_known_issues.xml:
PS1, Line 936:             Examine the <codeph>HDFS_SCAN_NODE</codeph> portion
of a query profile that scans the
This unfortunately won't give accurate info for all queries: if the query isn't materialising
any columns (e.g. count(*)) or the file is filtered out by runtime filters, the file compression
was inaccurate in previous versions - see IMPALA-5311 and IMPALA-4863 and respectively.

One way to tell for sure is to run something like "select * from table" and then look. Or,
say, "select min(string_col) from table"
PS1, Line 937:             suspected table. Look for <q>File Formats</q>. A value
containing <codeph>PARQUET/NONE</codeph>
It might be helpful to note common cases where uncompressed Parquet is/isn't created. Impala
generates snappy-compressed Parquet by default unless compression_codec is changed. Most uncompressed
parquet we see in the wild is generated by Hive or other non-Impala tools.

To view, visit
To unsubscribe, visit

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I731eb0e029dc3cc251f4df0c5a8ad281c81595cb
Gerrit-Change-Number: 8418
Gerrit-PatchSet: 1
Gerrit-Owner: John Russell <>
Gerrit-Reviewer: Greg Rahn <>
Gerrit-Reviewer: John Russell <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-Comment-Date: Mon, 30 Oct 2017 21:35:39 +0000
Gerrit-HasComments: Yes

  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message