hdt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roman Shaposhnik <...@apache.org>
Subject Re: Hola and such!
Date Fri, 21 Dec 2012 18:10:34 GMT
Hi Bob!

Welcome on board!

On Tue, Dec 18, 2012 at 6:48 PM, Bob Kerns <rwk@acm.org> wrote:
> So let me introduce myself. I've been writing software since about 1970,
> back in the days of batch processing and punch cards. It feels a bit like
> coming full circle, though of course, many never left the batch-processing
> world.


> But one thing I haven't done is work on large, highly structured
> open-source projects, and Apache in particular. I do have a rough idea of
> the process and culture -- but I'm sure there are rough edges that need
> knocking off. :)

We would absolutely welcome any kinds of contribution -- code, ideas,
documentation, testing effort -- you name it. Every bit counts.

> I'm also a newcomer to Hadoop, but I've been working hard on rectifying
> that over the past month. I have set up a 12-node cluster set up at home,
> and my wife as a built-in user community (she does movie special effects).

If you're looking for a pretty polished experience of how to setup
hadoop clusters and you want a 100% ASF-driven hadoop distro --
take a look at Apache Bigtop. We've just had a 0.5.0 release
and you can install our convenience binary artifacts by
simply dropping .list/.repo file into your package manager
set of sources:

> * Configuration and setup is too hard. It would help to be able to import
> configuration files directly, for example.

Are you talking about Hadoop? If so -- that's exactly what Bigtop
aims at addressing.

> * Exploring large datasets is likely to be important to our users -- but
> opening a large HDFS file kills Eclipse dead. We need to be able to explore
> without loading the entire file into an Eclipse buffer! I think it would
> also help if the tools better showed how the tasks will see the data, as
> well as handle the various file formats (mapfiles, Avro- and
> Writable-formatted data, etc.).
> * Interfacing to more aspects of the overall Hadoop ecosystem.
> * Compatibility with different versions of Hadoop and related tools.
> Interfacing with Hadoop 3.0 or CHD5 should require no more than adding a
> new driver plugin.

All good points! Looking forward to working with you.


View raw message