impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dimitris Tsirogiannis <dtsirogian...@cloudera.com>
Subject Re: [Impala-ASF-CR] IMPALA-5333: Add support for Impala to work with ADLS
Date Sat, 20 May 2017 05:00:02 GMT
On Fri, May 19, 2017 at 6:01 PM, Sailesh Mukil (Code Review) <
gerrit@cloudera.org> wrote:

> Sailesh Mukil has posted comments on this change.
>
> Change subject: IMPALA-5333: Add support for Impala to work with ADLS
> ......................................................................
>
>
> Patch Set 3:
>
> > I have a few high level questions about this patch. This patch
>  > treats S3 and ADL the same way but after looking at the HDFS
>  > classes and the ADL API I see some differences between ADL and S3
>  > API. For instance, I don't see a way in ADL to recursively
>  > enumerate all entries under a particular directory (something that
>  > exists in S3). Our code today for S3 sort of relies on that
>  > property so I am wondering if our code works as expected or if I am
>  > missing something (quite possible). Also, the AdlFileSystem class
>  > has functions for returning the block locations, which means that
>  > we don't have to call the synthesize metadata calls as we do for
>  > S3. In the long run, HDFS may expose replica locations from the ADL
>  > local tiers (e.g. Cosmos), but I guess this is not relevant today.
>  > Thoughts?
>
> Very good questions, Dimitris.
>
> - Could you point me to the exact API that does the recursive enumeration?
>

I was referring to the listFiles() method in S3AFileSystem. I couldn't find
an equivalent call that recursively enumerates all files under a path in
the ADLS API. See the code in HdfsTable::loadBlockMetadata() for uses of
listFiles().


>
> - It's true that we don't need to synthesize the block metadata. However,
> I was in conversation with the ADLS connector folks and ADLS doesn't
> actually expose the block locations internally. So if you look at the
> getFileBlockLocations() code in AdlFileSystem, it actually itself
> synthesizes the block metadata:
> http://github.mtv.cloudera.com/CDH/hadoop/blob/
> 4da5de1a7c965f63f0b798e9ec4ec72ed7249f18/hadoop-tools/
> hadoop-azure-datalake/src/main/java/org/apache/hadoop/
> fs/adl/AdlFileSystem.java#L893
>
>
Exactly. I was thinking that if there is a call to do that, we don't have
to do it ourselves.


> So functionally it's the same. But I agree, we should call the relevant
> API if there it is "supported" in some form or the other. I will update the
> patch.
>
> --
> To view, visit http://gerrit.cloudera.org:8080/6910
> To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
>
> Gerrit-MessageType: comment
> Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542
> Gerrit-PatchSet: 3
> Gerrit-Project: Impala-ASF
> Gerrit-Branch: master
> Gerrit-Owner: Sailesh Mukil <sailesh@cloudera.com>
> Gerrit-Reviewer: Attila Jeges <attilaj@cloudera.com>
> Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
> Gerrit-Reviewer: David Knupp <dknupp@cloudera.com>
> Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
> Gerrit-Reviewer: Matthew Jacobs <mj@cloudera.com>
> Gerrit-Reviewer: Sailesh Mukil <sailesh@cloudera.com>
> Gerrit-HasComments: No
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message