impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Russell <jruss...@cloudera.com>
Subject Re: January 2016 podling report
Date Thu, 31 Dec 2015 05:17:07 GMT
I would say there's a fair bit of decision-making and followup work having to do with documentation.

For example, the current Impala docs that are embedded within the Cloudera doc library cover
a wide range of subjects:

- "How to use Impala with <component XYZ>".  For example, Impala with Sentry, Impala
with HBase, Impala with S3, Impala with Isilon...  Some components are Hadoop-based, others
are more specific to what's shipped or integrated with CDH.  I feel like we should have a
spreadsheet because these seem like decisions to make on a case-by-case basis.

- "How to do <task XYZ> with Impala".  Performance tuning, troubleshooting, deployment
planning.  Same kinds of considerations as the previous bullet.  Many of these aren't strictly
part of core Impala features, rather they're things that could have been delivered via blog
posts, O'Reilly books, etc.  Again, there could be some amount of identifying / deciding /
untangling to produce the right subset to go in Apache-oriented docs.

- "How to do <task XYZ> with Impala in Cloudera Manager".  That seems like an easy call
to say, that kind of stuff doesn't get donated to Apache because it's CDH-specific.  That
kind of content though is intermixed with "how to do <task XYZ> _without_ Cloudera Manager"
so it would be some work to untangle instructions like that.

- "CREATE TABLE" and similar language reference stuff.  Doesn't every SQL engine in the open
source arena come with a language reference of one sort or another...  So I assume there has
to be something either donated or created from scratch along those lines.  (Although my open
source experience is with MySQL, where the docs are under a more restrictive license than
the software, so I don't have exact precedents to go by.)

Assuming that some amount of existing CDH doc is donated, then for purposes of building, accepting
contributions, etc. do we need to convert the content to some particular format or use some
specific build system?  The doc content that I'm talking about is currently in XML, with a
DTD (DITA) that can be built using an all-open-source toolchain.  The format and toolchain
might be a little more heavyweight than on a lot of other Apache projects.

The main advantage of the current format for the Impala doc library is ease of reuse.  So
there's the question of whether Apache-donated stuff doc like language reference then _only_
exists in the context of the project site, or gets reused within the doc library on cloudera.com.
 There are pros and cons either way.  Even if we centralize future docs on the impala.io site,
so there isn't a new instance corresponding to each new CDH x.y release, there are still all
the older instances of those pages from CDH 4.x, CDH 5.x, Impala 1.x, and Impala 2.x docs
on cloudera.com.

I've been cogitating over these considerations the last few weeks, but no approach has really
jumped out at me as a slam dunk:

a) Rip as much existing doc out of the Cloudera library as possible, convert to the most contributor-friendly
format, decouple entirely from the CDH library?
b) Donate core Impala feature docs only, keep the XML format the same, encourage verbatim
reuse of doc content across CDH and other distributions that include Impala?
c) Some middle ground?  For example, it would be possible to mix and match the current XML
doc format with user-contributed content in Markdown format.

Thanks,
John

> On Dec 30, 2015, at 3:07 PM, Henry Robinson <henry@apache.org> wrote:
> 
> Hi all -
> 
> Here's a draft of our inaugural podling report. Per the usual guidelines,
> Impala has to submit three monthly reports to the Incubator PPMC, after
> which we report every quarter. The purpose of the report is to expose the
> current state of the graduation effort to the Incubator, and to flag any
> problems that require Incubator attention.
> 
> I hope this report also sheds a little light on what is needed to be done
> to move Impala's development in its entirety to the ASF and its
> infrastructure. We are looking forward to making quick progress on some of
> these items in 2016.
> 
> If anyone has any further comments or edits they'd like to make, please
> respond to this thread. I am on a short timeline as I fly internationally
> tomorrow and will be out of contact for about ten days, so I plan to post
> this to the Incubator wiki tomorrow morning. Any edits can then be made
> there.
> 
> Thanks,
> Henry
> 
> --------------------
> Impala
> 
> Impala is a high-performance C++ and Java SQL query engine for data stored
> in
> Apache Hadoop-based clusters.
> 
> Impala has been incubating since 2015-12-03.
> 
> Three most important issues to address in the move towards graduation:
> 
>  1. Resolve any issues around use of Gerrit as code-review tool.
>  2. Movement of existing JIRA / Git / wiki / e-mail resources to Apache
> equivalents
>  3. Initial release as incubating project.
> 
> Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
> aware of?
> 
> None.
> 
> How has the community developed since the last report?
> 
> Slowly - Impala is still in the very early stages of incubation, and
> performing the mechanical tasks of code movement and infrastructure setup
> is our first priority. The holiday period in the United States has slowed
> this effort slightly, but we look forward to picking up pace in early 2016.
> There have been no additions to the committer or PMC lists since incubation
> began.
> 
> How has the project developed since the last report?
> 
> We have performed some of the basic initial tasks for incubation -
> establishing wiki pages, Git repositories and accounts for the initial
> committer set. Our next steps are:
> 
> 1. Finalize the SGA from Cloudera
> 2. Move existing @cloudera.org e-mail aliases to their @
> impala.incubator.apache.org equivalents.
> 3. Move source code from Cloudera git repository to Apache git repo.
> 4. Improve out-of-box build and test experience so that community can
> easily evaluate release artifacts.
> 5. Migrate cloudera.org JIRA tickets to issues.apache.org.
> 
> 
> Date of last release:
> 
>  NA
> 
> When were the last committers or PMC members elected?
> 
> At the time of the Incubation vote, 2015-12-03.


Mime
View raw message