impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sailesh Mukil (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5333: [DOCS] Document Impala ADLS support
Date Fri, 07 Jul 2017 03:49:30 GMT
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5333: [DOCS] Document Impala ADLS support
......................................................................


Patch Set 2:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/7175/2/docs/shared/impala_common.xml
File docs/shared/impala_common.xml:

PS2, Line 1072: <p rev="2.9.0 IMPALA-5333" id="adls_dml_performance">
              :         Because of differences between ADLS and traditional filesystems, DML
operations
              :         for ADLS tables can take longer than for tables on HDFS.
              :         <draft-comment>
              :           Is there anything to say on this subject, if ADLS doesn't have
              :           the same file-moving-to-the-trashcan performance overhead as S3?
              :         </draft-comment>
              :       </p>
This isn't necessarily true for ADLS the way it's true for S3. In S3 we don't have renames
which is what this point tries to highlight, which isn't a problem for ADLS. So for ADLS,
we needn't mention this.


PS2, Line 1098: Because data files written to ADLS do not have a default block size
This should be more like:

"Because ADLS doesn't expose the block sizes of its files..."


PS2, Line 1153:       <p rev="2.9.0 IMPALA-5333" id="adls_dml">
              :         In <keyword keyref="impala29_full"/> and higher, the Impala
DML statements (<codeph>INSERT</codeph>, <codeph>LOAD DATA</codeph>,
              :         and <codeph>CREATE TABLE AS SELECT</codeph>) can write
data into a table or partition that resides in the
              :         Azure Data Lake Store (ADLS).
              :         The syntax of the DML statements is the same as for any other tables,
because the ADLS location for tables and
              :         partitions is specified by an <codeph>adl://</codeph>
prefix in the
              :         <codeph>LOCATION</codeph> attribute of
              :         <codeph>CREATE TABLE</codeph> or <codeph>ALTER TABLE</codeph>
statements.
              :         If you bring data into ADLS using the normal ADLS transfer mechanisms
instead of Impala DML statements,
              :         issue a <codeph>REFRESH</codeph> statement for the table
before using Impala to query the ADLS data.
              :       </p>
This was added for S3 since INSERT and LOAD DATA were added in a later release than SELECT
etc. Do we need to mention this for ADLS? We have support for both reads and writes in the
same release.


http://gerrit.cloudera.org:8080/#/c/7175/1/docs/topics/impala_adls.xml
File docs/topics/impala_adls.xml:

PS1, Line 209: or on earlier Impala releases without DML support for ADLS
We don't have earlier releases with any support for ADLS, so I don't think that this point
is worth mentioning.


PS1, Line 271: ph>adl
We call them stores in ADLS. So probably just do a find replace of "bucket" with "store"


PS1, Line 291: ibute.
The stores have the format "adl://<store>.azuredatalakestore.net/path/to/file"


PS1, Line 313: !??? ls adl://impala-demo/dir1/dir2/dir3 --recursive;
> What's the ADLS command line syntax to do this? (I'm basing this on an S3 e
I've not used the ADLS command line tool. It seemed a little hard to setup. So I've been getting
by with hadoop fs -ls "adl://blah"

Do we have to mention the ADLS command line tool?


PS1, Line 329: !??? ls adl://impala-demo/dir1/dir2/dir3 --recursive;
> Same question as on line 313.
ditto


http://gerrit.cloudera.org:8080/#/c/7175/2/docs/topics/impala_adls.xml
File docs/topics/impala_adls.xml:

PS2, Line 405: impala-demo
Did this work?


PS2, Line 589: ADLS does not have the
             :           same block notion as HDFS
ADLS does not expose block sizes like HDFS does,


-- 
To view, visit http://gerrit.cloudera.org:8080/7175
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id5a98217741e5d540d9874e9b30e36f01644ef14
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <jrussell@cloudera.com>
Gerrit-Reviewer: David Knupp <dknupp@cloudera.com>
Gerrit-Reviewer: John Russell <jrussell@cloudera.com>
Gerrit-Reviewer: Laurel Hale <laurel@cloudera.com>
Gerrit-Reviewer: Michael Brown <mikeb@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sailesh@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message