hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: Project announcement: Sailfish (also, looking for colloborators)
Date Tue, 08 May 2012 19:11:54 GMT
Sriram et. al.,

Do you intend this to be a joint project with the Hadoop community or
a technology competitor?

Regrettably, KFS is not a "drop in replacement" for HDFS.
Hypothetically: I have several petabytes of data in an existing HDFS
deployment, which is the norm, and a continuous MapReduce workflow.
How do you propose I, practically, migrate to something like Sailfish
without a major capital expenditure and/or downtime and/or data loss?

However, can the Sailfish I-files implementation be plugged in as an
alternate Shuffle implementation in MRv2 (see MAPREDUCE-3060 and
MAPREDUCE-4049), with necessary additional plumbing for dynamic
adjustment of reduce task population? And the workbuilder could be
part of an alternate MapReduce Application Manager? The I-file concept
could possibly be implemented here in a fairly self contained way. One
could even colocate/embed a KFS filesystem with such an alternate
shuffle, like how MR task temporary space is usually colocated with
HDFS storage.

Does this seem reasonable in any way?

Best regards,

   - Andy

>>  From: Sriram Rao <sriramsrao@gmail.com>
>> To: common-dev@hadoop.apache.org
>> Sent: Tuesday, May 8, 2012 10:32 AM
>> Subject: Project announcement: Sailfish (also, looking for colloborators)
>> Hi,
>> I'd like to announce the release of a new open source project, Sailfish.
>> http://code.google.com/p/sailfish/
>> Sailfish tries to improve Hadoop-performance, particularly for large-jobs
>> which process TB's of data and run for hours.  In building Sailfish, we
>> modify how map-output is handled and transported from map->reduce.
>> The project pages provide more information about the project.
>> We are looking for colloborators who can help get some of the ideas into
>> Apache Hadoop. A possible step forward could be to make "shuffle" phase of
>> Hadoop pluggable.
>> If you are interested in working with us, please get in touch with me.
>> Sriram

Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet
Hein (via Tom White)

View raw message