hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "M. C. Srivas" <mcsri...@gmail.com>
Subject Re: Sailfish
Date Fri, 11 May 2012 05:50:41 GMT
Sriram,   Sailfish depends on append. I just noticed the HDFS disabled
append. How does one use this with Hadoop?


On Wed, May 9, 2012 at 9:00 AM, Otis Gospodnetic <otis_gospodnetic@yahoo.com
> wrote:

> Hi Sriram,
>
> >> The I-file concept could possibly be implemented here in a fairly self
> contained way. One
> >> could even colocate/embed a KFS filesystem with such an alternate
> >> shuffle, like how MR task temporary space is usually colocated with
> >> HDFS storage.
>
> >  Exactly.
>
> >> Does this seem reasonable in any way?
>
> > Great. Where do go from here?  How do we get a colloborative effort
> going?
>
>
> Sounds like a JIRA issue should be opened, the approach briefly described,
> and the first implementation attempt made.  Then iterate.
>
> I look forward to seeing this! :)
>
> Otis
> --
>
> Performance Monitoring for Solr / ElasticSearch / HBase -
> http://sematext.com/spm
>
>
>
> >________________________________
> > From: Sriram Rao <sriramsrao@gmail.com>
> >To: common-dev@hadoop.apache.org
> >Sent: Tuesday, May 8, 2012 6:48 PM
> >Subject: Re: Sailfish
> >
> >Dear Andy,
> >
> >> From: Andrew Purtell <apurt...@apache.org>
> >> ...
> >
> >> Do you intend this to be a joint project with the Hadoop community or
> >> a technology competitor?
> >
> >As I had said in my email, we are looking for folks to colloborate
> >with us to help get us integrated with Hadoop.  So, to be explicitly
> >clear, we are intending for this to be a joint project with the
> >community.
> >
> >> Regrettably, KFS is not a "drop in replacement" for HDFS.
> >> Hypothetically: I have several petabytes of data in an existing HDFS
> >> deployment, which is the norm, and a continuous MapReduce workflow.
> >> How do you propose I, practically, migrate to something like Sailfish
> >> without a major capital expenditure and/or downtime and/or data loss?
> >
> >Well, we are not asking for KFS to replace HDFS.  One path you could
> >take is to experiment with Sailfish---use KFS just for the
> >intermediate data and HDFS for everything else.  There is no major
> >capex :).  While you get comfy with pushing intermediate data into a
> >DFS, we get the ideas added to HDFS.  This simplifies deployment
> >considerations.
> >
> >> However, can the Sailfish I-files implementation be plugged in as an
> >> alternate Shuffle implementation in MRv2 (see MAPREDUCE-3060 and
> >> MAPREDUCE-4049),
> >
> >This'd be great!
> >
> >> with necessary additional plumbing for dynamic
> >> adjustment of reduce task population? And the workbuilder could be
> >> part of an alternate MapReduce Application Manager?
> >
> >It should be part of the AM.  (Currently, with our implementation in
> >Hadoop-0.20.2, the workbuilder serves the role of an AM).
> >
> >> The I-file concept could possibly be implemented here in a fairly self
> contained way. One
> >> could even colocate/embed a KFS filesystem with such an alternate
> >> shuffle, like how MR task temporary space is usually colocated with
> >> HDFS storage.
> >
> >Exactly.
> >
> >> Does this seem reasonable in any way?
> >
> >Great. Where do go from here?  How do we get a colloborative effort going?
> >
> >Best,
> >
> >Sriram
> >
> >>>  From: Sriram Rao <sriramsrao@gmail.com>
> >>> To: common-dev@hadoop.apache.org
> >>> Sent: Tuesday, May 8, 2012 10:32 AM
> >>> Subject: Project announcement: Sailfish (also, looking for
> colloborators)
> >>>
> >>> Hi,
> >>>
> >>> I'd like to announce the release of a new open source project,
> Sailfish.
> >>>
> >>> http://code.google.com/p/sailfish/
> >>>
> >>> Sailfish tries to improve Hadoop-performance, particularly for
> large-jobs
> >>> which process TB's of data and run for hours.  In building Sailfish, we
> >>> modify how map-output is handled and transported from map->reduce.
> >>>
> >>> The project pages provide more information about the project.
> >>>
> >>> We are looking for colloborators who can help get some of the ideas
> into
> >>> Apache Hadoop. A possible step forward could be to make "shuffle"
> phase of
> >>> Hadoop pluggable.
> >>>
> >>> If you are interested in working with us, please get in touch with me.
> >>>
> >>> Sriram
> >>
> >
> >
> >
> >--
> >Best regards,
> >
> >   - Andy
> >
> >Problems worthy of attack prove their worth by hitting back. - Piet
> >Hein (via Tom White)
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message