impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness
Date Mon, 30 Oct 2017 21:35:39 GMT
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8418 )

Change subject: IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8418/1/docs/topics/impala_known_issues.xml
File docs/topics/impala_known_issues.xml:

http://gerrit.cloudera.org:8080/#/c/8418/1/docs/topics/impala_known_issues.xml@936
PS1, Line 936:             Examine the <codeph>HDFS_SCAN_NODE</codeph> portion
of a query profile that scans the
This unfortunately won't give accurate info for all queries: if the query isn't materialising
any columns (e.g. count(*)) or the file is filtered out by runtime filters, the file compression
was inaccurate in previous versions - see IMPALA-5311 and IMPALA-4863 and respectively.

One way to tell for sure is to run something like "select * from table" and then look. Or,
say, "select min(string_col) from table"


http://gerrit.cloudera.org:8080/#/c/8418/1/docs/topics/impala_known_issues.xml@937
PS1, Line 937:             suspected table. Look for <q>File Formats</q>. A value
containing <codeph>PARQUET/NONE</codeph>
It might be helpful to note common cases where uncompressed Parquet is/isn't created. Impala
generates snappy-compressed Parquet by default unless compression_codec is changed. Most uncompressed
parquet we see in the wild is generated by Hive or other non-Impala tools.



-- 
To view, visit http://gerrit.cloudera.org:8080/8418
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I731eb0e029dc3cc251f4df0c5a8ad281c81595cb
Gerrit-Change-Number: 8418
Gerrit-PatchSet: 1
Gerrit-Owner: John Russell <jrussell@cloudera.com>
Gerrit-Reviewer: Greg Rahn <grahn@cloudera.com>
Gerrit-Reviewer: John Russell <jrussell@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Comment-Date: Mon, 30 Oct 2017 21:35:39 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message