hdt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mirko Kämpf <mirko.kae...@gmail.com>
Subject Re: [VOTE] Retire HDT
Date Mon, 10 Nov 2014 09:35:36 GMT

my VOTE:


and regarding the vote for retiring the HDT project I would like to suggest
to clearly define the scope of the project if it survives after the VOTE.

I found it hard to explain to other people what the role of HDT is. Since
the Kite-SDK offers
dataset centric libraries and Morphlines for reusable "single record" ETL
operations I was more
focused on this side. But anyway, even if you know the Hadoop Ecosystem, it
is not easy to
see what are the most often used components. In between I think the
Morphlines are great and
some tool support for developers and analysts would be great. I created
"MorphMiner". It is a tool, which allows
editing and testing of Morphlines in an GUI, and this could be a
contribution to HDT, but I think, right now,
it is not really clear if it is a good fit, as I can not see the overall
picture of the HDT vision.

What do you think about the role of HDT? It could be the single entry point
for developer with an abstract "cluster handling" component.

This means, (A) we would have to enable a connection to existing cluster
via their manager API, e.g. Cloudera Managers REST API or comparable APIs
from other venders would be used retrieve status and to enable simple
operations, but in the other hand, this seems to be an overhead, as such
tools already provide all relevant information, but in a different system.
Here it would already be fine to have a browser tab in eclipse to access
the cluster. Even Hue could be embedded.

(B) for web developers it would be fine to have a "HUE Module" available as
a template to start coding, testing and deployment.

We could see, that application development around Hadoop is not "the one
task, done in one IDE", but a set of multiple activities which include even
administration and data or metadata management. An IDE is often seen as the
"environment to do the coding in a productive way - not deployment, and
this can confuse Hadoop newbies.

Maybe this are reasons for the low activity, because the focus is not clear
and the tasks are that diverse.

I think, instead of retirement of HDT we should actively create "The case
for HDT". One way to do this could be a collection of best practices and
tutorials which show how HDT helps or even can help - from here we can go
on with the tool development affords and hopefully with some work which
integrates the Kite SDK into HDT. The dataset tools is already a good
starting point. Based on this, a dataset inspector which even produces
dataset profiles seems to be a doable project for a student. I volunteer
for mentoring and providing an existing skeleton of the code for this

To include more ideas from Kite SDK developers and other people I know, who
may be interested in this discussion I send it to some "of list addresses"
to invite those people.

Good luck HDT !!!


2014-11-10 9:45 GMT+01:00 Rahul Sharma <rsharma@apache.org>:

> Hi all,
> Based on the discussion happened on the mailing list [1] ,I'd like to call
> a VOTE to retire[2] Apache HDT from  Apache Incubator. It appears i that
> the project has lost community interest with almost no activity on mailing
> lists.
> This VOTE will be open for at least 72 hours and passes on achieving a
> consensus.
>  +1 [ ] Yes, I am in favor of retiring HDT from the Apache Incubator.
>  +0 [ ]
>  -1 [ ] No, I am not in favor of retiring HDT because...
> regards
> Rahul
> [1] http://apache.markmail.org/message/ljcrnj5uluiemvaz
> [2] http://incubator.apache.org/guides/retirement.html

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message