hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Jungblut <thomas.jungb...@googlemail.com>
Subject Re: Please review new APIs.
Date Thu, 03 Nov 2011 09:09:11 GMT
Yes I'm sorry, the problem was actually that I thought we are going to be
incompatible.
But that is not correct ;)

2011/11/2 Edward J. Yoon <edwardyoon@apache.org>

> Just FYI, one reason is that there're a lot of KeyValue stores.
>
> On Wed, Nov 2, 2011 at 11:46 PM, Thomas Jungblut
> <thomas.jungblut@googlemail.com> wrote:
> > Ah okay I see why.
> > But I don't see that this is very good. BTW the classes you've added from
> > Hadoop are missing the Apache header.
> >
> > Sorry for spamming.
> >
> > 2011/11/2 Thomas Jungblut <thomas.jungblut@googlemail.com>
> >
> >> And what is the reason to implement our own Input/output format if you
> >> stick with key/value pairs.
> >> Let's be compatible to Hadoop and use theirs.
> >>
> >> And we should really stop copying hadoop stuff arround. It is already
> >> there.
> >>
> >>
> >> 2011/11/2 Thomas Jungblut <thomas.jungblut@googlemail.com>
> >>
> >>> Great :)
> >>>
> >>> Do you have plans to integrate a partitioning? Currently this is just a
> >>> block assignment partitioning, hardcoded in the client.
> >>> This won't be useful for PageRank and SSSP.
> >>> This would help us in Graph package as well for the next release.
> >>>
> >>> 2011/11/2 Edward J. Yoon <edwardyoon@apache.org>
> >>>
> >>>> > For sure I agree we should allow the former programming model with
> no
> >>>> input> without explicitly instantiating dummy inputs/splits. What
> about
> >>>> providing> two basic (different) implementations?
> >>>>
> >>>> +1
> >>>>
> >>>> I was about to.
> >>>> On Wed, Nov 2, 2011 at 9:23 PM, Tommaso Teofili
> >>>> <tommaso.teofili@gmail.com> wrote:
> >>>> > 2011/11/2 Thomas Jungblut <thomas.jungblut@googlemail.com>
> >>>> >
> >>>> >> Another point while fixing the local runner:
> >>>> >>
> >>>> >> Are we now input driven?
> >>>> >> I see in the code that the user defined task number is overriden
by
> >>>> the
> >>>> >> number of splits.
> >>>> >> Was this your intention? This will actually make realtime
> processing
> >>>> with
> >>>> >> no static input a real pain.
> >>>> >> For example if you want a similar behaviour in Hadoop M/R you'll
> need
> >>>> to
> >>>> >> create dummy splits, and this is not what we should aim at.
> >>>> >>
> >>>> >> We could simply check if the user define the NullInputFormat
or
> >>>> nothing and
> >>>> >> then use the number of tasks the user has configured.
> >>>> >>
> >>>> >
> >>>> > For sure I agree we should allow the former programming model with
> no
> >>>> input
> >>>> > without explicitly instantiating dummy inputs/splits. What about
> >>>> providing
> >>>> > two basic (different) implementations?
> >>>> > Tommaso
> >>>> >
> >>>> >
> >>>> >>
> >>>> >> 2011/11/2 Tommaso Teofili <tommaso.teofili@gmail.com>
> >>>> >>
> >>>> >> > 2011/11/2 Edward J. Yoon <edwardyoon@apache.org>
> >>>> >> >
> >>>> >> > > > I'm sure that not every job actually needs a
cleanup or a
> setup.
> >>>> >> > >
> >>>> >> > > You're right. Almost BSP applications should override
bsp()
> method
> >>>> >> > > but, setup() and cleaner() methods are not as you
said. Let's
> fix
> >>>> >> > > them.
> >>>> >> > >
> >>>> >> >
> >>>> >> > Agreed +1
> >>>> >> >
> >>>> >> >
> >>>> >> > >
> >>>> >> > > > Generally I would suggest to integrate the OutputCollector
> and
> >>>> the
> >>>> >> > > > RecordReader into the BSPPeerImpl.
> >>>> >> > > > So our peer is like the context in Hadoop.
> >>>> >> > >
> >>>> >> > > Good idea.
> >>>> >> > >
> >>>> >> >
> >>>> >> > +1 here too
> >>>> >> >
> >>>> >> > Tommaso
> >>>> >> >
> >>>> >> >
> >>>> >> > >
> >>>> >> > > On Wed, Nov 2, 2011 at 9:03 PM, Thomas Jungblut
> >>>> >> > > <thomas.jungblut@googlemail.com> wrote:
> >>>> >> > > > Yes. When I reworked that API, I made a default
> implementation
> >>>> in our
> >>>> >> > > > abstract BSP class.
> >>>> >> > > > So the user has to override the methods for
himself, if he
> >>>> needs to.
> >>>> >> > > > I'm sure that not every job actually needs a
cleanup or a
> setup.
> >>>> >> > > >
> >>>> >> > > > Generally I would suggest to integrate the OutputCollector
> and
> >>>> the
> >>>> >> > > > RecordReader into the BSPPeerImpl.
> >>>> >> > > > So our peer is like the context in Hadoop.
> >>>> >> > > > But that is just a minor thing. It is a great
improvement ;)
> >>>> >> > > >
> >>>> >> > > > 2011/11/2 Edward J. Yoon <edwardyoon@apache.org>
> >>>> >> > > >
> >>>> >> > > >> There're bsp(), setup() and cleaner() methods.
> >>>> >> > > >>
> >>>> >> > > >> What is you suggestion?
> >>>> >> > > >>
> >>>> >> > > >> On Wed, Nov 2, 2011 at 8:47 PM, Thomas Jungblut
> >>>> >> > > >> <thomas.jungblut@googlemail.com> wrote:
> >>>> >> > > >> > Have a look at the combiner class.
I know that this is
> just a
> >>>> >> > "test",
> >>>> >> > > but
> >>>> >> > > >> > it is really messy if the user does
not use the methods,
> but
> >>>> is
> >>>> >> > > forced to
> >>>> >> > > >> > override them.
> >>>> >> > > >> >
> >>>> >> > > >> > 2011/11/2 Edward J. Yoon <edwardyoon@apache.org>
> >>>> >> > > >> >
> >>>> >> > > >> >> Why?
> >>>> >> > > >> >>
> >>>> >> > > >> >> On Wed, Nov 2, 2011 at 8:21 PM,
Thomas Jungblut
> >>>> >> > > >> >> <thomas.jungblut@googlemail.com>
wrote:
> >>>> >> > > >> >> > I totally dislike that BSP
class now has abstract
> methods
> >>>> >> instead
> >>>> >> > > of
> >>>> >> > > >> >> > default implementations.
> >>>> >> > > >> >> >
> >>>> >> > > >> >> > 2011/11/2 Edward J. Yoon <edwardyoon@apache.org>
> >>>> >> > > >> >> >
> >>>> >> > > >> >> >> Hi all,
> >>>> >> > > >> >> >>
> >>>> >> > > >> >> >> As you know, recently
combiners and IO are added.
> >>>> >> > > >> >> >>
> >>>> >> > > >> >> >> Please review them from
user viewpoint.
> >>>> >> > > >> >> >>
> >>>> >> > > >> >> >>
> >>>> >> > > >> >> >>
> >>>> >> > > >> >>
> >>>> >> > > >>
> >>>> >> > >
> >>>> >> >
> >>>> >>
> >>>>
> http://svn.apache.org/repos/asf/incubator/hama/trunk/examples/src/main/java/org/apache/hama/examples/PiEstimator.java
> >>>> >> > > >> >> >>
> >>>> >> > > >> >> >> I'm testing multiple tasks
and IO features on 100
> nodes
> >>>> >> cluster
> >>>> >> > > using
> >>>> >> > > >> >> >> 10 tasks per node. If
there's no issue, I'll close
> >>>> HAMA-258.
> >>>> >> > > >> >> >>
> >>>> >> > > >> >> >> Thanks.
> >>>> >> > > >> >> >>
> >>>> >> > > >> >> >> --
> >>>> >> > > >> >> >> Best Regards, Edward J.
Yoon
> >>>> >> > > >> >> >> @eddieyoon
> >>>> >> > > >> >> >>
> >>>> >> > > >> >> >
> >>>> >> > > >> >> >
> >>>> >> > > >> >> >
> >>>> >> > > >> >> > --
> >>>> >> > > >> >> > Thomas Jungblut
> >>>> >> > > >> >> > Berlin <thomas.jungblut@gmail.com>
> >>>> >> > > >> >> >
> >>>> >> > > >> >>
> >>>> >> > > >> >>
> >>>> >> > > >> >>
> >>>> >> > > >> >> --
> >>>> >> > > >> >> Best Regards, Edward J. Yoon
> >>>> >> > > >> >> @eddieyoon
> >>>> >> > > >> >>
> >>>> >> > > >> >
> >>>> >> > > >> >
> >>>> >> > > >> >
> >>>> >> > > >> > --
> >>>> >> > > >> > Thomas Jungblut
> >>>> >> > > >> > Berlin <thomas.jungblut@gmail.com>
> >>>> >> > > >> >
> >>>> >> > > >>
> >>>> >> > > >>
> >>>> >> > > >>
> >>>> >> > > >> --
> >>>> >> > > >> Best Regards, Edward J. Yoon
> >>>> >> > > >> @eddieyoon
> >>>> >> > > >>
> >>>> >> > > >
> >>>> >> > > >
> >>>> >> > > >
> >>>> >> > > > --
> >>>> >> > > > Thomas Jungblut
> >>>> >> > > > Berlin <thomas.jungblut@gmail.com>
> >>>> >> > > >
> >>>> >> > >
> >>>> >> > >
> >>>> >> > >
> >>>> >> > > --
> >>>> >> > > Best Regards, Edward J. Yoon
> >>>> >> > > @eddieyoon
> >>>> >> > >
> >>>> >> >
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >> --
> >>>> >> Thomas Jungblut
> >>>> >> Berlin <thomas.jungblut@gmail.com>
> >>>> >>
> >>>> >
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Best Regards, Edward J. Yoon
> >>>> @eddieyoon
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Thomas Jungblut
> >>> Berlin <thomas.jungblut@gmail.com>
> >>>
> >>
> >>
> >>
> >> --
> >> Thomas Jungblut
> >> Berlin <thomas.jungblut@gmail.com>
> >>
> >
> >
> >
> > --
> > Thomas Jungblut
> > Berlin <thomas.jungblut@gmail.com>
> >
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>



-- 
Thomas Jungblut
Berlin <thomas.jungblut@gmail.com>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message