impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Russell (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-3398: Rework Impala documentation to be non-Cloduera-specific
Date Tue, 29 Nov 2016 22:28:29 GMT
John Russell has posted comments on this change.

Change subject: IMPALA-3398: Rework Impala documentation to be non-Cloduera-specific

Patch Set 2:


Because I'm advocating re-doing the automated portions of this change at a later time, I don't
want to promote this patch set as-is. Is it practical to say "abandon" for the code review
but still come back and pick out bits and pieces (e.g. the changes in impala_jdbc.xml) to
become part of a later code review?
Commit Message:

PS2, Line 16:     sed -i 's/ rev=""//g' $FILENAME
Let's leave this step out, for a couple of reasons:
- There's no harm to have rev="" with an empty attribute value.
- There are some cases where rev="CDH-xyz" could also be annotated with the IMPALA- number,
so I think it would be useful to leave behind the empty attribute as a reminder that this
was a change due to bug fix or feature addition.
- Sometimes for new feature docs, I add rev="" when I don't have the JIRA number handy and
come back and fill it in later.
File docs/impala_html.ditaval:

PS2, Line 6: <prop att="audience" val="Hidden" action="exclude"/>
That's fine. We will need to make the same addition in the Cloudera doc build so that the
attribute name works transparently in both contexts.
File docs/topics/impala_alter_table.xml:

PS2, Line 25:       <indexterm audience="Hidden">ALTER TABLE statement</indexterm>
For the <indexterm> tags, maybe we could change those to audience="HTML" so they were
visible in the PDF. The Cloudera doc team has always been resistant to using index entries
but I do have them (just hidden) all through the doc source. We should do a trial and see
if enabling them produces a decent-looking index in the PDF.

PS2, Line 220:     <p rev="2.3.0 IMPALA-1568">
The reason I'm so liberal with release numbers and JIRA numbers with this rev= attribute is
in hopes of automating an "errata" mechanism in future. I.e. "what changed in such-and-such
a release, what changed due to such-and-such an issue". I have an 80% working prototype that
I should demo for you guys, to judge should that be something that becomes part of the upstream
docs or continue development on it as a Cloudera-only extension.

PS2, Line 406:     <p>
Here's a case where perhaps there is a corresponding IMPALA- JIRA number we should dig up
and include in the rev= attribute. That's why I would say to leave an empty rev="" behind,
so as not to lose that aspect of the history.
File docs/topics/impala_conversion_functions.xml:

PS2, Line 30:       Although in <keyword keyref="impala23_full"/>, the <codeph>SHOW
FUNCTIONS</codeph> output for
Just checking that this was a change introduced by Jim. This is a case where we could re-eveluate
whether to take out that note, if the situation it refers to is no longer applicable. (I haven't
confirmed in this particular case yet.)

PS2, Line 443: <dlentry rev="2.3.0" id="casttosmallint" audience="Hidden">
These castto*() functions might be intended to be permanently for internal Impala use only,
in which case we could remove the corresponding info entirely.

PS2, Line 619: <dlentry rev="2.3.0" id="typeof">
Hmm, this might have a corresponding IMPALA- JIRA number that should go in the rev=.
File docs/topics/impala_jdbc.xml:

PS2, Line 82:         In Impala 2.0 and later, you must use the Hive 0.13 JDBC driver.  If
you are
            :         already using JDBC applications with an earlier Impala release, you
must update
            :         your JDBC driver, because the Hive 0.12 driver that was formerly the
only choice
            :         is not compatible with Impala
Urgh. "Must" seems like too strong a statement. I wonder if the right thing to do here is
to say you definitely can use the Hive driver, and maybe there's a vendor-specific driver.
Because so many times in the past we've had to warn some people away from the Hive driver
for performance reasons, and direct other people to the Hive driver to work around some bug
in the Cloudera driver.

PS2, Line 120:           To get the JAR files, install the Hive JDBC driver on each host in
the cluster that will run
It's worth checking if there's a generic Hadoop or Hive URL we could refer to here.

PS2, Line 139:   hive-common-X.XX.X.jar
             :   hive-jdbc-X.XX.X.jar
             :   hive-metastore-X.XX.X.jar
             :   hive-servi
We should confirm that these filenames are accurate if someone goes the generic route when
installing the Hive driver.

PS2, Line 161:               <xref href=""
scope="external" format="html">this
Ironic that there's a reference to "Cloudera Impala" that perhaps we could never actually
get rid of.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib3f63fb309e0617d7fe014231bb0ab0ad67c8474
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple <>
Gerrit-Reviewer: Anonymous Coward #250
Gerrit-Reviewer: Greg Rahn <>
Gerrit-Reviewer: Jim Apple <>
Gerrit-Reviewer: John Russell <>
Gerrit-HasComments: Yes

View raw message