impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sailesh Mukil (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-5378: Disk IO manager needs to understand ADLS
Date Wed, 31 May 2017 23:13:59 GMT
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5378: Disk IO manager needs to understand ADLS

Patch Set 1:

(1 comment)

Ah, regarding runtime/hdfs-fs-cache, we did some plumbing for passing the keys through libHDFS
if users didn't want to set it in core-site.xml.

We're not planning to do it for ADLS unless there's a big ask for it. Also, there is an easier
alternative by using the Hadoop encrypted credential store which should land soon for Hadoop
AdlFileSystem. The above work was done for S3 in hdfs-fs-cache before there was a plan for
this credential store for S3.
File be/src/runtime/

Line 402:   // ADLS uses buffer sizes of 4k. Given that, and the above JNI array allocation
> but we'd still truncate to the actual length of the column's data pages in 
Yes, it would cut a buffer at 4MB or a flush, whatever comes first. We'd want to optimize
for the more likely case. Is it safe to say that in most cases we'd have data pages > 4MB

Regarding requiring more CPU, this was found while settling on the read chunk size for S3.
The comment above (L392-L397) explains the overhead.

Sounds good, I'll convert it to a flag.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I067f053fec941e3631610c5cc89a384f257ba906
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil <>
Gerrit-Reviewer: Marcel Kornacker <>
Gerrit-Reviewer: Sailesh Mukil <>
Gerrit-HasComments: Yes

View raw message