hdt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bob Kerns <...@acm.org>
Subject Hola and such!
Date Wed, 19 Dec 2012 02:48:56 GMT
Hi!

I've been looking into the hadoop tool situation for FICO, and had pretty
much reached the conclusion that we were going to need to contribute to
bring it up to snuff, and that it really needed to be split off from the
main Hadoop project.

Imagine my surprise, when I started looking into who was involved, and how
to start a discussion, and found that Adam had already taken the initiative
and gotten the ball rolling!

Anyway, 1+ to the general idea, and here's one more contributor.

So let me introduce myself. I've been writing software since about 1970,
back in the days of batch processing and punch cards. It feels a bit like
coming full circle, though of course, many never left the batch-processing
world.

At MIT in the 1970's, was a MacLisp maintainer and a developer on the
Macsyma symbolic algebra system. At Symbolics, I was a developer, and for a
time, I managed the software maintenance / release team. I've worked for
DEC and tiny startups, and collaborated on small open-source projects
around the world. I've done networking stacks from the drivers up, more AI
rule engines than I can count, UI, web apps (server side and AJAX), Eclipse
plugins and RCP apps, and everything from little Android apps to giant
enterprise tools.

But one thing I haven't done is work on large, highly structured
open-source projects, and Apache in particular. I do have a rough idea of
the process and culture -- but I'm sure there are rough edges that need
knocking off. :)

I'm also a newcomer to Hadoop, but I've been working hard on rectifying
that over the past month. I have set up a 12-node cluster set up at home,
and my wife as a built-in user community (she does movie special effects).

But my immediate concerns with the Eclipse plugin are to meet the needs of
end users, many of whom will be more concerned with working with the data
in conjunction with our Eclipse-based products.

Coming from that background, some pain points I'd like to address when we
get underway:

* Configuration and setup is too hard. It would help to be able to import
configuration files directly, for example.

* Exploring large datasets is likely to be important to our users -- but
opening a large HDFS file kills Eclipse dead. We need to be able to explore
without loading the entire file into an Eclipse buffer! I think it would
also help if the tools better showed how the tasks will see the data, as
well as handle the various file formats (mapfiles, Avro- and
Writable-formatted data, etc.).

* Interfacing to more aspects of the overall Hadoop ecosystem.

* Compatibility with different versions of Hadoop and related tools.
Interfacing with Hadoop 3.0 or CHD5 should require no more than adding a
new driver plugin.

I look forward to working with you!

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message