hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Jungblut <thomas.jungb...@googlemail.com>
Subject Re: Please review new APIs.
Date Wed, 02 Nov 2011 14:46:41 GMT
Ah okay I see why.
But I don't see that this is very good. BTW the classes you've added from
Hadoop are missing the Apache header.

Sorry for spamming.

2011/11/2 Thomas Jungblut <thomas.jungblut@googlemail.com>

> And what is the reason to implement our own Input/output format if you
> stick with key/value pairs.
> Let's be compatible to Hadoop and use theirs.
>
> And we should really stop copying hadoop stuff arround. It is already
> there.
>
>
> 2011/11/2 Thomas Jungblut <thomas.jungblut@googlemail.com>
>
>> Great :)
>>
>> Do you have plans to integrate a partitioning? Currently this is just a
>> block assignment partitioning, hardcoded in the client.
>> This won't be useful for PageRank and SSSP.
>> This would help us in Graph package as well for the next release.
>>
>> 2011/11/2 Edward J. Yoon <edwardyoon@apache.org>
>>
>>> > For sure I agree we should allow the former programming model with no
>>> input> without explicitly instantiating dummy inputs/splits. What about
>>> providing> two basic (different) implementations?
>>>
>>> +1
>>>
>>> I was about to.
>>> On Wed, Nov 2, 2011 at 9:23 PM, Tommaso Teofili
>>> <tommaso.teofili@gmail.com> wrote:
>>> > 2011/11/2 Thomas Jungblut <thomas.jungblut@googlemail.com>
>>> >
>>> >> Another point while fixing the local runner:
>>> >>
>>> >> Are we now input driven?
>>> >> I see in the code that the user defined task number is overriden by
>>> the
>>> >> number of splits.
>>> >> Was this your intention? This will actually make realtime processing
>>> with
>>> >> no static input a real pain.
>>> >> For example if you want a similar behaviour in Hadoop M/R you'll need
>>> to
>>> >> create dummy splits, and this is not what we should aim at.
>>> >>
>>> >> We could simply check if the user define the NullInputFormat or
>>> nothing and
>>> >> then use the number of tasks the user has configured.
>>> >>
>>> >
>>> > For sure I agree we should allow the former programming model with no
>>> input
>>> > without explicitly instantiating dummy inputs/splits. What about
>>> providing
>>> > two basic (different) implementations?
>>> > Tommaso
>>> >
>>> >
>>> >>
>>> >> 2011/11/2 Tommaso Teofili <tommaso.teofili@gmail.com>
>>> >>
>>> >> > 2011/11/2 Edward J. Yoon <edwardyoon@apache.org>
>>> >> >
>>> >> > > > I'm sure that not every job actually needs a cleanup
or a setup.
>>> >> > >
>>> >> > > You're right. Almost BSP applications should override bsp()
method
>>> >> > > but, setup() and cleaner() methods are not as you said. Let's
fix
>>> >> > > them.
>>> >> > >
>>> >> >
>>> >> > Agreed +1
>>> >> >
>>> >> >
>>> >> > >
>>> >> > > > Generally I would suggest to integrate the OutputCollector
and
>>> the
>>> >> > > > RecordReader into the BSPPeerImpl.
>>> >> > > > So our peer is like the context in Hadoop.
>>> >> > >
>>> >> > > Good idea.
>>> >> > >
>>> >> >
>>> >> > +1 here too
>>> >> >
>>> >> > Tommaso
>>> >> >
>>> >> >
>>> >> > >
>>> >> > > On Wed, Nov 2, 2011 at 9:03 PM, Thomas Jungblut
>>> >> > > <thomas.jungblut@googlemail.com> wrote:
>>> >> > > > Yes. When I reworked that API, I made a default implementation
>>> in our
>>> >> > > > abstract BSP class.
>>> >> > > > So the user has to override the methods for himself,
if he
>>> needs to.
>>> >> > > > I'm sure that not every job actually needs a cleanup
or a setup.
>>> >> > > >
>>> >> > > > Generally I would suggest to integrate the OutputCollector
and
>>> the
>>> >> > > > RecordReader into the BSPPeerImpl.
>>> >> > > > So our peer is like the context in Hadoop.
>>> >> > > > But that is just a minor thing. It is a great improvement
;)
>>> >> > > >
>>> >> > > > 2011/11/2 Edward J. Yoon <edwardyoon@apache.org>
>>> >> > > >
>>> >> > > >> There're bsp(), setup() and cleaner() methods.
>>> >> > > >>
>>> >> > > >> What is you suggestion?
>>> >> > > >>
>>> >> > > >> On Wed, Nov 2, 2011 at 8:47 PM, Thomas Jungblut
>>> >> > > >> <thomas.jungblut@googlemail.com> wrote:
>>> >> > > >> > Have a look at the combiner class. I know that
this is just a
>>> >> > "test",
>>> >> > > but
>>> >> > > >> > it is really messy if the user does not use
the methods, but
>>> is
>>> >> > > forced to
>>> >> > > >> > override them.
>>> >> > > >> >
>>> >> > > >> > 2011/11/2 Edward J. Yoon <edwardyoon@apache.org>
>>> >> > > >> >
>>> >> > > >> >> Why?
>>> >> > > >> >>
>>> >> > > >> >> On Wed, Nov 2, 2011 at 8:21 PM, Thomas Jungblut
>>> >> > > >> >> <thomas.jungblut@googlemail.com> wrote:
>>> >> > > >> >> > I totally dislike that BSP class now
has abstract methods
>>> >> instead
>>> >> > > of
>>> >> > > >> >> > default implementations.
>>> >> > > >> >> >
>>> >> > > >> >> > 2011/11/2 Edward J. Yoon <edwardyoon@apache.org>
>>> >> > > >> >> >
>>> >> > > >> >> >> Hi all,
>>> >> > > >> >> >>
>>> >> > > >> >> >> As you know, recently combiners
and IO are added.
>>> >> > > >> >> >>
>>> >> > > >> >> >> Please review them from user viewpoint.
>>> >> > > >> >> >>
>>> >> > > >> >> >>
>>> >> > > >> >> >>
>>> >> > > >> >>
>>> >> > > >>
>>> >> > >
>>> >> >
>>> >>
>>> http://svn.apache.org/repos/asf/incubator/hama/trunk/examples/src/main/java/org/apache/hama/examples/PiEstimator.java
>>> >> > > >> >> >>
>>> >> > > >> >> >> I'm testing multiple tasks and
IO features on 100 nodes
>>> >> cluster
>>> >> > > using
>>> >> > > >> >> >> 10 tasks per node. If there's no
issue, I'll close
>>> HAMA-258.
>>> >> > > >> >> >>
>>> >> > > >> >> >> Thanks.
>>> >> > > >> >> >>
>>> >> > > >> >> >> --
>>> >> > > >> >> >> Best Regards, Edward J. Yoon
>>> >> > > >> >> >> @eddieyoon
>>> >> > > >> >> >>
>>> >> > > >> >> >
>>> >> > > >> >> >
>>> >> > > >> >> >
>>> >> > > >> >> > --
>>> >> > > >> >> > Thomas Jungblut
>>> >> > > >> >> > Berlin <thomas.jungblut@gmail.com>
>>> >> > > >> >> >
>>> >> > > >> >>
>>> >> > > >> >>
>>> >> > > >> >>
>>> >> > > >> >> --
>>> >> > > >> >> Best Regards, Edward J. Yoon
>>> >> > > >> >> @eddieyoon
>>> >> > > >> >>
>>> >> > > >> >
>>> >> > > >> >
>>> >> > > >> >
>>> >> > > >> > --
>>> >> > > >> > Thomas Jungblut
>>> >> > > >> > Berlin <thomas.jungblut@gmail.com>
>>> >> > > >> >
>>> >> > > >>
>>> >> > > >>
>>> >> > > >>
>>> >> > > >> --
>>> >> > > >> Best Regards, Edward J. Yoon
>>> >> > > >> @eddieyoon
>>> >> > > >>
>>> >> > > >
>>> >> > > >
>>> >> > > >
>>> >> > > > --
>>> >> > > > Thomas Jungblut
>>> >> > > > Berlin <thomas.jungblut@gmail.com>
>>> >> > > >
>>> >> > >
>>> >> > >
>>> >> > >
>>> >> > > --
>>> >> > > Best Regards, Edward J. Yoon
>>> >> > > @eddieyoon
>>> >> > >
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Thomas Jungblut
>>> >> Berlin <thomas.jungblut@gmail.com>
>>> >>
>>> >
>>>
>>>
>>>
>>> --
>>> Best Regards, Edward J. Yoon
>>> @eddieyoon
>>>
>>
>>
>>
>> --
>> Thomas Jungblut
>> Berlin <thomas.jungblut@gmail.com>
>>
>
>
>
> --
> Thomas Jungblut
> Berlin <thomas.jungblut@gmail.com>
>



-- 
Thomas Jungblut
Berlin <thomas.jungblut@gmail.com>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message