hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Evans <ev...@yahoo-inc.com>
Subject Re: Sailfish
Date Fri, 11 May 2012 14:29:13 GMT
That makes perfect sense to me.  Especially because it really is a new implementation of shuffle
that is optimized for very large jobs.  I am happy to see anything go in that is going to
improve the performance of hadoop, and I look forward to running some benchmarks on the changes.
 I am not super familiar with sailfish, but from what I remember from a while ago it is the
modified version of KFS that is in reality doing the sorting.  The maps will output data to
"chunks" aka blocks that when each chunk is full it is sorted.  When the sorting is finished
for a chunk the reducers are now free to pull the sorted data from the chunks and run.  I
have a few concerns with it though.


 1.  How do we securely handle different comparators?  Currently comparators run as the user
that launched the job, not as a privileged user.  Sailfish seems to require that comparators
run as a privileged user, or we only support pure bitwise sorting of keys.
 2.  How does this work in a mixed environment?  Sailfish, as I understand it, is optimized
for large map/reduce jobs, and can be slower on small jobs than the current implementation.
 How do we make it so that large jobs are able to run faster, but not negatively impact the
more common small jobs?  We could run both in parallel and switch between them depending on
the size of the job's input, or a config key of some sort, but then the RAM needed to make
these big jobs run fast would not be available for smaller jobs to use when no really big
job is running.

--Bobby Evans

On 5/11/12 1:32 AM, "Todd Lipcon" <todd@cloudera.com> wrote:

Hey Sriram,

We discussed this before, but for the benefit of the wider audience: :)

It seems like the requirements imposed on KFS by Sailfish are in most
ways much simplier than the requirements of a full distributed
filesystem. The one thing we need is atomic record append -- but we
don't need anything else, like filesystem metadata/naming,
replication, corrupt data scanning, etc. All of the data is
transient/short-lived and at replication count 1.

So I think building something specific to this use case would be
pretty practical - and my guess is it might even have some benefits
over trying to use a full DFS.

In the MR2 architecture, I'd probably try to build this as a service
plugin in the NodeManager (similar to the way that the ShuffleHandler
in the current implementation works)

-Todd

On Thu, May 10, 2012 at 11:01 PM, Sriram Rao <sriramsrao@gmail.com> wrote:
> Srivas,
>
> Sailfish is builds upon record append (a feature not present in HDFS).
>
> The software that is currently released is based on Hadoop-0.20.2.  You use
> the Sailfish version of Hadoop-0.20.2, KFS for the intermediate data, and
> then HDFS (or KFS) for storing the job/input.  Since the changes are all in
> the handling of map output/reduce input, it is transparent to existing jobs.
>
> What is being proposed below is to bolt all the starting/stopping of the
> related deamons into YARN as a first step.  There are other approaches that
> are possible, which have a similar effect.
>
> Hope this helps.
>
> Sriram
>
>
> On Thu, May 10, 2012 at 10:50 PM, M. C. Srivas <mcsrivas@gmail.com> wrote:
>
>> Sriram,   Sailfish depends on append. I just noticed the HDFS disabled
>> append. How does one use this with Hadoop?
>>
>>
>> On Wed, May 9, 2012 at 9:00 AM, Otis Gospodnetic <
>> otis_gospodnetic@yahoo.com
>> > wrote:
>>
>> > Hi Sriram,
>> >
>> > >> The I-file concept could possibly be implemented here in a fairly self
>> > contained way. One
>> > >> could even colocate/embed a KFS filesystem with such an alternate
>> > >> shuffle, like how MR task temporary space is usually colocated with
>> > >> HDFS storage.
>> >
>> > >  Exactly.
>> >
>> > >> Does this seem reasonable in any way?
>> >
>> > > Great. Where do go from here?  How do we get a colloborative effort
>> > going?
>> >
>> >
>> > Sounds like a JIRA issue should be opened, the approach briefly
>> described,
>> > and the first implementation attempt made.  Then iterate.
>> >
>> > I look forward to seeing this! :)
>> >
>> > Otis
>> > --
>> >
>> > Performance Monitoring for Solr / ElasticSearch / HBase -
>> > http://sematext.com/spm
>> >
>> >
>> >
>> > >________________________________
>> > > From: Sriram Rao <sriramsrao@gmail.com>
>> > >To: common-dev@hadoop.apache.org
>> > >Sent: Tuesday, May 8, 2012 6:48 PM
>> > >Subject: Re: Sailfish
>> > >
>> > >Dear Andy,
>> > >
>> > >> From: Andrew Purtell <apurt...@apache.org>
>> > >> ...
>> > >
>> > >> Do you intend this to be a joint project with the Hadoop community
or
>> > >> a technology competitor?
>> > >
>> > >As I had said in my email, we are looking for folks to colloborate
>> > >with us to help get us integrated with Hadoop.  So, to be explicitly
>> > >clear, we are intending for this to be a joint project with the
>> > >community.
>> > >
>> > >> Regrettably, KFS is not a "drop in replacement" for HDFS.
>> > >> Hypothetically: I have several petabytes of data in an existing HDFS
>> > >> deployment, which is the norm, and a continuous MapReduce workflow.
>> > >> How do you propose I, practically, migrate to something like Sailfish
>> > >> without a major capital expenditure and/or downtime and/or data loss?
>> > >
>> > >Well, we are not asking for KFS to replace HDFS.  One path you could
>> > >take is to experiment with Sailfish---use KFS just for the
>> > >intermediate data and HDFS for everything else.  There is no major
>> > >capex :).  While you get comfy with pushing intermediate data into a
>> > >DFS, we get the ideas added to HDFS.  This simplifies deployment
>> > >considerations.
>> > >
>> > >> However, can the Sailfish I-files implementation be plugged in as an
>> > >> alternate Shuffle implementation in MRv2 (see MAPREDUCE-3060 and
>> > >> MAPREDUCE-4049),
>> > >
>> > >This'd be great!
>> > >
>> > >> with necessary additional plumbing for dynamic
>> > >> adjustment of reduce task population? And the workbuilder could be
>> > >> part of an alternate MapReduce Application Manager?
>> > >
>> > >It should be part of the AM.  (Currently, with our implementation in
>> > >Hadoop-0.20.2, the workbuilder serves the role of an AM).
>> > >
>> > >> The I-file concept could possibly be implemented here in a fairly self
>> > contained way. One
>> > >> could even colocate/embed a KFS filesystem with such an alternate
>> > >> shuffle, like how MR task temporary space is usually colocated with
>> > >> HDFS storage.
>> > >
>> > >Exactly.
>> > >
>> > >> Does this seem reasonable in any way?
>> > >
>> > >Great. Where do go from here?  How do we get a colloborative effort
>> going?
>> > >
>> > >Best,
>> > >
>> > >Sriram
>> > >
>> > >>>  From: Sriram Rao <sriramsrao@gmail.com>
>> > >>> To: common-dev@hadoop.apache.org
>> > >>> Sent: Tuesday, May 8, 2012 10:32 AM
>> > >>> Subject: Project announcement: Sailfish (also, looking for
>> > colloborators)
>> > >>>
>> > >>> Hi,
>> > >>>
>> > >>> I'd like to announce the release of a new open source project,
>> > Sailfish.
>> > >>>
>> > >>> http://code.google.com/p/sailfish/
>> > >>>
>> > >>> Sailfish tries to improve Hadoop-performance, particularly for
>> > large-jobs
>> > >>> which process TB's of data and run for hours.  In building Sailfish,
>> we
>> > >>> modify how map-output is handled and transported from map->reduce.
>> > >>>
>> > >>> The project pages provide more information about the project.
>> > >>>
>> > >>> We are looking for colloborators who can help get some of the ideas
>> > into
>> > >>> Apache Hadoop. A possible step forward could be to make "shuffle"
>> > phase of
>> > >>> Hadoop pluggable.
>> > >>>
>> > >>> If you are interested in working with us, please get in touch with
>> me.
>> > >>>
>> > >>> Sriram
>> > >>
>> > >
>> > >
>> > >
>> > >--
>> > >Best regards,
>> > >
>> > >   - Andy
>> > >
>> > >Problems worthy of attack prove their worth by hitting back. - Piet
>> > >Hein (via Tom White)
>> > >
>> > >
>> > >
>> >
>>



--
Todd Lipcon
Software Engineer, Cloudera


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message