Return-Path: X-Original-To: apmail-mahout-dev-archive@www.apache.org Delivered-To: apmail-mahout-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E280F10C48 for ; Tue, 29 Apr 2014 19:41:36 +0000 (UTC) Received: (qmail 26609 invoked by uid 500); 29 Apr 2014 19:41:35 -0000 Delivered-To: apmail-mahout-dev-archive@mahout.apache.org Received: (qmail 26465 invoked by uid 500); 29 Apr 2014 19:41:35 -0000 Mailing-List: contact dev-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mahout.apache.org Delivered-To: mailing list dev@mahout.apache.org Received: (qmail 26453 invoked by uid 99); 29 Apr 2014 19:41:35 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Apr 2014 19:41:35 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ssc.open@googlemail.com designates 74.125.83.41 as permitted sender) Received: from [74.125.83.41] (HELO mail-ee0-f41.google.com) (74.125.83.41) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Apr 2014 19:41:29 +0000 Received: by mail-ee0-f41.google.com with SMTP id t10so676789eei.28 for ; Tue, 29 Apr 2014 12:41:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=message-id:date:from:reply-to:user-agent:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=WPFUbWypxIJdI/7cKrT//A/aAD8btRlZu0cCVVA4V9s=; b=X90mOwpiR8Cdd8ZWyd3RqfIhgJwaqVcoxhL2XWvcg6qbtOTeFb5ZiYSs+ltziMNuEu kJj3ZwDKKPY1D7y1yEd2u+tsSvXtA+uW8YK5F59t4aowVyf7TVvFT2EvmvEQKbYV0Jf3 dITgv8J4qPi6ghhAy+wMV1jyF11XVvJRTpsFYOVS8Kn9GhAJUpDeGuDV5qoQEgg+y2P7 drF2K8aJgJ6+EW9H8700H06wMvKi8KGkB/tCpyM4c63moIrZeTQxGdqMurKZR0AmrDYP 6JhpRsC82IKLtCmXWAe2hOtJCZtN0EvziWCIjp6i8WNvPThFGWVJWIMwnjOyL4QqzQFa j7GA== X-Received: by 10.14.45.6 with SMTP id o6mr1271759eeb.24.1398800466755; Tue, 29 Apr 2014 12:41:06 -0700 (PDT) Received: from [192.168.0.2] (g231184121.adsl.alicedsl.de. [92.231.184.121]) by mx.google.com with ESMTPSA id 44sm61670094eek.30.2014.04.29.12.41.05 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 29 Apr 2014 12:41:06 -0700 (PDT) Message-ID: <53600050.5090502@apache.org> Date: Tue, 29 Apr 2014 21:41:04 +0200 From: Sebastian Schelter Reply-To: ssc@apache.org, ssc@apache.org User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: dev@mahout.apache.org Subject: Re: Straw poll re: H2O ? References: <535E9970.7010907@apache.org> <535FFC34.6050702@apache.org> In-Reply-To: <535FFC34.6050702@apache.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org For reasons of transparency in this discussion, I should add that I am a committer on the upcoming Stratosphere ASF podling, co-worker of the main developers and have contributed to it as part of my PhD. On 04/29/2014 09:23 PM, Sebastian Schelter wrote: > Anand, > > I'm trying to answer some of your questions, and my answers highlight > the points that I would like to see clarified about h20. > > On 04/28/2014 11:13 PM, Anand Avati wrote: > >> 1. Why is the DSL claiming to have (in its vision) logical vs physical >> separation if not for providing multiple compute backends? > > This is not a claim or a vision, the DSL already has this separation. > Take for example o.a.m.sparkbindings.drm.plan.OpAtA, thats the logical > operator for executing a Transpose-Times-Self matrix multiplication. In > o.a.m.sparkbindings.blas.AtA you will find two physical operator > implementations for that. The choice which one to use depends on whether > there is enough memory to hold certain intermediary results in memory. > > The primary intention of a separation into logical and physical > operators is to allow for a declarative programming style on the users > side and for an optimizer on the system side which automatically chooses > the optimal physical operator for the execution of a specific program. > > This choice of the physical operator might depend on the shape and > amount of the data processed as well on the underlying available > resources. *The separation into logical and physical operators clearly > doesn't imply to have multiple backends*. It only makes it very easy to > support them. > >> >> 2. Does the proposal of having a new DSL backend in the future (for e.g >> stratosphere as suggested elsewhere) make you: > >> -- worry that stratosphere would be a dependency to Mahout? > > Stratosphere has been accepted as a incubator project in the ASF > recently, so the worry about such a dependency is naturally less than > about an externally managed project like h20. > >> -- worry that as a user/commiter/contributor you have to worry about a >> new >> framework? > > In my eyes, there is a big difference between Spark/Stratosphere and > h20. Spark and Stratosphere have a clearly defined programming and > execution model. They execute programs that are composed of a DAG of > operators. The set of operators has clearly defined semantics and > parallelization strategies. If you compare their operators, you will > find that they offer pretty much the same in lightly different flavors. > For both, there are scientific papers that in detail explain all these > things. > > I have asked about a detailed description of h20's programming model and > execution model and I searched the documentation, but I haven't been > able to find something that clearly describes how things are done. I > would love to read up on this, but until I'm presented with this, I have > to assume that such a principled foundation is missing. > > > --sebastian >