Return-Path: X-Original-To: apmail-incubator-hama-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-hama-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 537299ED6 for ; Wed, 2 Nov 2011 14:47:07 +0000 (UTC) Received: (qmail 10296 invoked by uid 500); 2 Nov 2011 14:47:07 -0000 Delivered-To: apmail-incubator-hama-dev-archive@incubator.apache.org Received: (qmail 10269 invoked by uid 500); 2 Nov 2011 14:47:07 -0000 Mailing-List: contact hama-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hama-dev@incubator.apache.org Delivered-To: mailing list hama-dev@incubator.apache.org Received: (qmail 10261 invoked by uid 99); 2 Nov 2011 14:47:07 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Nov 2011 14:47:07 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of thomas.jungblut@googlemail.com designates 209.85.210.41 as permitted sender) Received: from [209.85.210.41] (HELO mail-pz0-f41.google.com) (209.85.210.41) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Nov 2011 14:47:02 +0000 Received: by pzk36 with SMTP id 36so482926pzk.0 for ; Wed, 02 Nov 2011 07:46:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=ax1FUZqgp2x1YWMICeijGNIqeIXjmIOBBfBggvTi17E=; b=BJFkdjBKc91rvwXl3vdhLm8WcINkQoLB0gaG7Tjjp/2qIL1RfZ8H0wkY3hnhzXG3FD qj5Vv3eVz5I8l/fm6J5n3VTQaTXQixk4Xb3CJifA27j5SPtldxbzPjprUDj/pN6z2Eoc n/k90ANLvFFTRkG/bvYuZGroS7Xnz6lp9xSpY= MIME-Version: 1.0 Received: by 10.68.20.99 with SMTP id m3mr5140411pbe.117.1320245201890; Wed, 02 Nov 2011 07:46:41 -0700 (PDT) Received: by 10.68.52.97 with HTTP; Wed, 2 Nov 2011 07:46:41 -0700 (PDT) In-Reply-To: References: Date: Wed, 2 Nov 2011 15:46:41 +0100 Message-ID: Subject: Re: Please review new APIs. From: Thomas Jungblut To: hama-dev@incubator.apache.org Content-Type: multipart/alternative; boundary=bcaec521639d466f5904b0c18a28 --bcaec521639d466f5904b0c18a28 Content-Type: text/plain; charset=ISO-8859-1 Ah okay I see why. But I don't see that this is very good. BTW the classes you've added from Hadoop are missing the Apache header. Sorry for spamming. 2011/11/2 Thomas Jungblut > And what is the reason to implement our own Input/output format if you > stick with key/value pairs. > Let's be compatible to Hadoop and use theirs. > > And we should really stop copying hadoop stuff arround. It is already > there. > > > 2011/11/2 Thomas Jungblut > >> Great :) >> >> Do you have plans to integrate a partitioning? Currently this is just a >> block assignment partitioning, hardcoded in the client. >> This won't be useful for PageRank and SSSP. >> This would help us in Graph package as well for the next release. >> >> 2011/11/2 Edward J. Yoon >> >>> > For sure I agree we should allow the former programming model with no >>> input> without explicitly instantiating dummy inputs/splits. What about >>> providing> two basic (different) implementations? >>> >>> +1 >>> >>> I was about to. >>> On Wed, Nov 2, 2011 at 9:23 PM, Tommaso Teofili >>> wrote: >>> > 2011/11/2 Thomas Jungblut >>> > >>> >> Another point while fixing the local runner: >>> >> >>> >> Are we now input driven? >>> >> I see in the code that the user defined task number is overriden by >>> the >>> >> number of splits. >>> >> Was this your intention? This will actually make realtime processing >>> with >>> >> no static input a real pain. >>> >> For example if you want a similar behaviour in Hadoop M/R you'll need >>> to >>> >> create dummy splits, and this is not what we should aim at. >>> >> >>> >> We could simply check if the user define the NullInputFormat or >>> nothing and >>> >> then use the number of tasks the user has configured. >>> >> >>> > >>> > For sure I agree we should allow the former programming model with no >>> input >>> > without explicitly instantiating dummy inputs/splits. What about >>> providing >>> > two basic (different) implementations? >>> > Tommaso >>> > >>> > >>> >> >>> >> 2011/11/2 Tommaso Teofili >>> >> >>> >> > 2011/11/2 Edward J. Yoon >>> >> > >>> >> > > > I'm sure that not every job actually needs a cleanup or a setup. >>> >> > > >>> >> > > You're right. Almost BSP applications should override bsp() method >>> >> > > but, setup() and cleaner() methods are not as you said. Let's fix >>> >> > > them. >>> >> > > >>> >> > >>> >> > Agreed +1 >>> >> > >>> >> > >>> >> > > >>> >> > > > Generally I would suggest to integrate the OutputCollector and >>> the >>> >> > > > RecordReader into the BSPPeerImpl. >>> >> > > > So our peer is like the context in Hadoop. >>> >> > > >>> >> > > Good idea. >>> >> > > >>> >> > >>> >> > +1 here too >>> >> > >>> >> > Tommaso >>> >> > >>> >> > >>> >> > > >>> >> > > On Wed, Nov 2, 2011 at 9:03 PM, Thomas Jungblut >>> >> > > wrote: >>> >> > > > Yes. When I reworked that API, I made a default implementation >>> in our >>> >> > > > abstract BSP class. >>> >> > > > So the user has to override the methods for himself, if he >>> needs to. >>> >> > > > I'm sure that not every job actually needs a cleanup or a setup. >>> >> > > > >>> >> > > > Generally I would suggest to integrate the OutputCollector and >>> the >>> >> > > > RecordReader into the BSPPeerImpl. >>> >> > > > So our peer is like the context in Hadoop. >>> >> > > > But that is just a minor thing. It is a great improvement ;) >>> >> > > > >>> >> > > > 2011/11/2 Edward J. Yoon >>> >> > > > >>> >> > > >> There're bsp(), setup() and cleaner() methods. >>> >> > > >> >>> >> > > >> What is you suggestion? >>> >> > > >> >>> >> > > >> On Wed, Nov 2, 2011 at 8:47 PM, Thomas Jungblut >>> >> > > >> wrote: >>> >> > > >> > Have a look at the combiner class. I know that this is just a >>> >> > "test", >>> >> > > but >>> >> > > >> > it is really messy if the user does not use the methods, but >>> is >>> >> > > forced to >>> >> > > >> > override them. >>> >> > > >> > >>> >> > > >> > 2011/11/2 Edward J. Yoon >>> >> > > >> > >>> >> > > >> >> Why? >>> >> > > >> >> >>> >> > > >> >> On Wed, Nov 2, 2011 at 8:21 PM, Thomas Jungblut >>> >> > > >> >> wrote: >>> >> > > >> >> > I totally dislike that BSP class now has abstract methods >>> >> instead >>> >> > > of >>> >> > > >> >> > default implementations. >>> >> > > >> >> > >>> >> > > >> >> > 2011/11/2 Edward J. Yoon >>> >> > > >> >> > >>> >> > > >> >> >> Hi all, >>> >> > > >> >> >> >>> >> > > >> >> >> As you know, recently combiners and IO are added. >>> >> > > >> >> >> >>> >> > > >> >> >> Please review them from user viewpoint. >>> >> > > >> >> >> >>> >> > > >> >> >> >>> >> > > >> >> >> >>> >> > > >> >> >>> >> > > >> >>> >> > > >>> >> > >>> >> >>> http://svn.apache.org/repos/asf/incubator/hama/trunk/examples/src/main/java/org/apache/hama/examples/PiEstimator.java >>> >> > > >> >> >> >>> >> > > >> >> >> I'm testing multiple tasks and IO features on 100 nodes >>> >> cluster >>> >> > > using >>> >> > > >> >> >> 10 tasks per node. If there's no issue, I'll close >>> HAMA-258. >>> >> > > >> >> >> >>> >> > > >> >> >> Thanks. >>> >> > > >> >> >> >>> >> > > >> >> >> -- >>> >> > > >> >> >> Best Regards, Edward J. Yoon >>> >> > > >> >> >> @eddieyoon >>> >> > > >> >> >> >>> >> > > >> >> > >>> >> > > >> >> > >>> >> > > >> >> > >>> >> > > >> >> > -- >>> >> > > >> >> > Thomas Jungblut >>> >> > > >> >> > Berlin >>> >> > > >> >> > >>> >> > > >> >> >>> >> > > >> >> >>> >> > > >> >> >>> >> > > >> >> -- >>> >> > > >> >> Best Regards, Edward J. Yoon >>> >> > > >> >> @eddieyoon >>> >> > > >> >> >>> >> > > >> > >>> >> > > >> > >>> >> > > >> > >>> >> > > >> > -- >>> >> > > >> > Thomas Jungblut >>> >> > > >> > Berlin >>> >> > > >> > >>> >> > > >> >>> >> > > >> >>> >> > > >> >>> >> > > >> -- >>> >> > > >> Best Regards, Edward J. Yoon >>> >> > > >> @eddieyoon >>> >> > > >> >>> >> > > > >>> >> > > > >>> >> > > > >>> >> > > > -- >>> >> > > > Thomas Jungblut >>> >> > > > Berlin >>> >> > > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > -- >>> >> > > Best Regards, Edward J. Yoon >>> >> > > @eddieyoon >>> >> > > >>> >> > >>> >> >>> >> >>> >> >>> >> -- >>> >> Thomas Jungblut >>> >> Berlin >>> >> >>> > >>> >>> >>> >>> -- >>> Best Regards, Edward J. Yoon >>> @eddieyoon >>> >> >> >> >> -- >> Thomas Jungblut >> Berlin >> > > > > -- > Thomas Jungblut > Berlin > -- Thomas Jungblut Berlin --bcaec521639d466f5904b0c18a28--