hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: Please review new APIs.
Date Wed, 02 Nov 2011 22:42:30 GMT
Just FYI, one reason is that there're a lot of KeyValue stores.

On Wed, Nov 2, 2011 at 11:46 PM, Thomas Jungblut
<thomas.jungblut@googlemail.com> wrote:
> Ah okay I see why.
> But I don't see that this is very good. BTW the classes you've added from
> Hadoop are missing the Apache header.
>
> Sorry for spamming.
>
> 2011/11/2 Thomas Jungblut <thomas.jungblut@googlemail.com>
>
>> And what is the reason to implement our own Input/output format if you
>> stick with key/value pairs.
>> Let's be compatible to Hadoop and use theirs.
>>
>> And we should really stop copying hadoop stuff arround. It is already
>> there.
>>
>>
>> 2011/11/2 Thomas Jungblut <thomas.jungblut@googlemail.com>
>>
>>> Great :)
>>>
>>> Do you have plans to integrate a partitioning? Currently this is just a
>>> block assignment partitioning, hardcoded in the client.
>>> This won't be useful for PageRank and SSSP.
>>> This would help us in Graph package as well for the next release.
>>>
>>> 2011/11/2 Edward J. Yoon <edwardyoon@apache.org>
>>>
>>>> > For sure I agree we should allow the former programming model with no
>>>> input> without explicitly instantiating dummy inputs/splits. What about
>>>> providing> two basic (different) implementations?
>>>>
>>>> +1
>>>>
>>>> I was about to.
>>>> On Wed, Nov 2, 2011 at 9:23 PM, Tommaso Teofili
>>>> <tommaso.teofili@gmail.com> wrote:
>>>> > 2011/11/2 Thomas Jungblut <thomas.jungblut@googlemail.com>
>>>> >
>>>> >> Another point while fixing the local runner:
>>>> >>
>>>> >> Are we now input driven?
>>>> >> I see in the code that the user defined task number is overriden
by
>>>> the
>>>> >> number of splits.
>>>> >> Was this your intention? This will actually make realtime processing
>>>> with
>>>> >> no static input a real pain.
>>>> >> For example if you want a similar behaviour in Hadoop M/R you'll
need
>>>> to
>>>> >> create dummy splits, and this is not what we should aim at.
>>>> >>
>>>> >> We could simply check if the user define the NullInputFormat or
>>>> nothing and
>>>> >> then use the number of tasks the user has configured.
>>>> >>
>>>> >
>>>> > For sure I agree we should allow the former programming model with no
>>>> input
>>>> > without explicitly instantiating dummy inputs/splits. What about
>>>> providing
>>>> > two basic (different) implementations?
>>>> > Tommaso
>>>> >
>>>> >
>>>> >>
>>>> >> 2011/11/2 Tommaso Teofili <tommaso.teofili@gmail.com>
>>>> >>
>>>> >> > 2011/11/2 Edward J. Yoon <edwardyoon@apache.org>
>>>> >> >
>>>> >> > > > I'm sure that not every job actually needs a cleanup
or a setup.
>>>> >> > >
>>>> >> > > You're right. Almost BSP applications should override
bsp() method
>>>> >> > > but, setup() and cleaner() methods are not as you said.
Let's fix
>>>> >> > > them.
>>>> >> > >
>>>> >> >
>>>> >> > Agreed +1
>>>> >> >
>>>> >> >
>>>> >> > >
>>>> >> > > > Generally I would suggest to integrate the OutputCollector
and
>>>> the
>>>> >> > > > RecordReader into the BSPPeerImpl.
>>>> >> > > > So our peer is like the context in Hadoop.
>>>> >> > >
>>>> >> > > Good idea.
>>>> >> > >
>>>> >> >
>>>> >> > +1 here too
>>>> >> >
>>>> >> > Tommaso
>>>> >> >
>>>> >> >
>>>> >> > >
>>>> >> > > On Wed, Nov 2, 2011 at 9:03 PM, Thomas Jungblut
>>>> >> > > <thomas.jungblut@googlemail.com> wrote:
>>>> >> > > > Yes. When I reworked that API, I made a default implementation
>>>> in our
>>>> >> > > > abstract BSP class.
>>>> >> > > > So the user has to override the methods for himself,
if he
>>>> needs to.
>>>> >> > > > I'm sure that not every job actually needs a cleanup
or a setup.
>>>> >> > > >
>>>> >> > > > Generally I would suggest to integrate the OutputCollector
and
>>>> the
>>>> >> > > > RecordReader into the BSPPeerImpl.
>>>> >> > > > So our peer is like the context in Hadoop.
>>>> >> > > > But that is just a minor thing. It is a great improvement
;)
>>>> >> > > >
>>>> >> > > > 2011/11/2 Edward J. Yoon <edwardyoon@apache.org>
>>>> >> > > >
>>>> >> > > >> There're bsp(), setup() and cleaner() methods.
>>>> >> > > >>
>>>> >> > > >> What is you suggestion?
>>>> >> > > >>
>>>> >> > > >> On Wed, Nov 2, 2011 at 8:47 PM, Thomas Jungblut
>>>> >> > > >> <thomas.jungblut@googlemail.com> wrote:
>>>> >> > > >> > Have a look at the combiner class. I know
that this is just a
>>>> >> > "test",
>>>> >> > > but
>>>> >> > > >> > it is really messy if the user does not
use the methods, but
>>>> is
>>>> >> > > forced to
>>>> >> > > >> > override them.
>>>> >> > > >> >
>>>> >> > > >> > 2011/11/2 Edward J. Yoon <edwardyoon@apache.org>
>>>> >> > > >> >
>>>> >> > > >> >> Why?
>>>> >> > > >> >>
>>>> >> > > >> >> On Wed, Nov 2, 2011 at 8:21 PM, Thomas
Jungblut
>>>> >> > > >> >> <thomas.jungblut@googlemail.com>
wrote:
>>>> >> > > >> >> > I totally dislike that BSP class
now has abstract methods
>>>> >> instead
>>>> >> > > of
>>>> >> > > >> >> > default implementations.
>>>> >> > > >> >> >
>>>> >> > > >> >> > 2011/11/2 Edward J. Yoon <edwardyoon@apache.org>
>>>> >> > > >> >> >
>>>> >> > > >> >> >> Hi all,
>>>> >> > > >> >> >>
>>>> >> > > >> >> >> As you know, recently combiners
and IO are added.
>>>> >> > > >> >> >>
>>>> >> > > >> >> >> Please review them from user
viewpoint.
>>>> >> > > >> >> >>
>>>> >> > > >> >> >>
>>>> >> > > >> >> >>
>>>> >> > > >> >>
>>>> >> > > >>
>>>> >> > >
>>>> >> >
>>>> >>
>>>> http://svn.apache.org/repos/asf/incubator/hama/trunk/examples/src/main/java/org/apache/hama/examples/PiEstimator.java
>>>> >> > > >> >> >>
>>>> >> > > >> >> >> I'm testing multiple tasks
and IO features on 100 nodes
>>>> >> cluster
>>>> >> > > using
>>>> >> > > >> >> >> 10 tasks per node. If there's
no issue, I'll close
>>>> HAMA-258.
>>>> >> > > >> >> >>
>>>> >> > > >> >> >> Thanks.
>>>> >> > > >> >> >>
>>>> >> > > >> >> >> --
>>>> >> > > >> >> >> Best Regards, Edward J. Yoon
>>>> >> > > >> >> >> @eddieyoon
>>>> >> > > >> >> >>
>>>> >> > > >> >> >
>>>> >> > > >> >> >
>>>> >> > > >> >> >
>>>> >> > > >> >> > --
>>>> >> > > >> >> > Thomas Jungblut
>>>> >> > > >> >> > Berlin <thomas.jungblut@gmail.com>
>>>> >> > > >> >> >
>>>> >> > > >> >>
>>>> >> > > >> >>
>>>> >> > > >> >>
>>>> >> > > >> >> --
>>>> >> > > >> >> Best Regards, Edward J. Yoon
>>>> >> > > >> >> @eddieyoon
>>>> >> > > >> >>
>>>> >> > > >> >
>>>> >> > > >> >
>>>> >> > > >> >
>>>> >> > > >> > --
>>>> >> > > >> > Thomas Jungblut
>>>> >> > > >> > Berlin <thomas.jungblut@gmail.com>
>>>> >> > > >> >
>>>> >> > > >>
>>>> >> > > >>
>>>> >> > > >>
>>>> >> > > >> --
>>>> >> > > >> Best Regards, Edward J. Yoon
>>>> >> > > >> @eddieyoon
>>>> >> > > >>
>>>> >> > > >
>>>> >> > > >
>>>> >> > > >
>>>> >> > > > --
>>>> >> > > > Thomas Jungblut
>>>> >> > > > Berlin <thomas.jungblut@gmail.com>
>>>> >> > > >
>>>> >> > >
>>>> >> > >
>>>> >> > >
>>>> >> > > --
>>>> >> > > Best Regards, Edward J. Yoon
>>>> >> > > @eddieyoon
>>>> >> > >
>>>> >> >
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Thomas Jungblut
>>>> >> Berlin <thomas.jungblut@gmail.com>
>>>> >>
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards, Edward J. Yoon
>>>> @eddieyoon
>>>>
>>>
>>>
>>>
>>> --
>>> Thomas Jungblut
>>> Berlin <thomas.jungblut@gmail.com>
>>>
>>
>>
>>
>> --
>> Thomas Jungblut
>> Berlin <thomas.jungblut@gmail.com>
>>
>
>
>
> --
> Thomas Jungblut
> Berlin <thomas.jungblut@gmail.com>
>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Mime
View raw message