Return-Path: X-Original-To: apmail-incubator-drill-user-archive@minotaur.apache.org Delivered-To: apmail-incubator-drill-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id ED7BC10021 for ; Fri, 5 Apr 2013 15:34:53 +0000 (UTC) Received: (qmail 56644 invoked by uid 500); 5 Apr 2013 15:34:53 -0000 Delivered-To: apmail-incubator-drill-user-archive@incubator.apache.org Received: (qmail 56543 invoked by uid 500); 5 Apr 2013 15:34:53 -0000 Mailing-List: contact drill-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: drill-user@incubator.apache.org Delivered-To: mailing list drill-user@incubator.apache.org Received: (qmail 56531 invoked by uid 99); 5 Apr 2013 15:34:53 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Apr 2013 15:34:53 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jacques.drill@gmail.com designates 209.85.128.182 as permitted sender) Received: from [209.85.128.182] (HELO mail-ve0-f182.google.com) (209.85.128.182) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Apr 2013 15:34:49 +0000 Received: by mail-ve0-f182.google.com with SMTP id m1so3689409ves.41 for ; Fri, 05 Apr 2013 08:34:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=PDSb7DRP9YACLNijT/UEh0QpgZxEMgmSGqBieNK6q1U=; b=yMTHcxiA92PiuxjXRdb7XqipNvnnTKkI/3QB1u1rmzCHMlyD6u8IEqXXJbkqin+RDc OwGk2fvztWJdqGYGGAPHDXrjCNhb+WCrUOD/sED9fg80aMt6Jh1vsS+uykdMzw1u4EXk QPeAAcs/VOfmQTDyH62UmDcTJJ0s6YM8WW1L1Knj+/pDYNCMtMi3NjNAEvIF3ta6Suaf YOJa7jwP3qZJwdmQXK+EEu+AQfHVhdW8MIBSQtWNCln0oB7MfmzGxnQuO2MZtFy9m3Rx ebciems6kW1OG3aH0BQ6xJScK0I+FsAvVhwQtCvA0qagC18arujir81yZKfs/MDTQ5is CHHA== MIME-Version: 1.0 X-Received: by 10.220.8.75 with SMTP id g11mr8499841vcg.60.1365176068391; Fri, 05 Apr 2013 08:34:28 -0700 (PDT) Sender: jacques.drill@gmail.com Received: by 10.59.10.164 with HTTP; Fri, 5 Apr 2013 08:34:28 -0700 (PDT) In-Reply-To: <1365139918.96932.YahooMailNeo@web122403.mail.ne1.yahoo.com> References: <1365071231.19838.YahooMailNeo@web122401.mail.ne1.yahoo.com> <1365137339.95296.YahooMailNeo@web122405.mail.ne1.yahoo.com> <1365139918.96932.YahooMailNeo@web122403.mail.ne1.yahoo.com> Date: Fri, 5 Apr 2013 08:34:28 -0700 X-Google-Sender-Auth: mpfZXaPFel48kfKn_UQ7SaoFeAs Message-ID: Subject: Re: Basic queries regarding Apache Drill working From: Jacques Nadeau To: drill-user@incubator.apache.org, devansh kumar Cc: Andrew Brust , "ted.dunning@gmail.com" Content-Type: multipart/alternative; boundary=bcaec54b45689cc9b204d99ed224 X-Virus-Checked: Checked by ClamAV on apache.org --bcaec54b45689cc9b204d99ed224 Content-Type: text/plain; charset=ISO-8859-1 The current thinking is that there will be an approximate query flag. This will be useful in situations where parallel approximations can be made. The simplest example is you want a top 10 group by attr1. You can do a local top N group by attr1 and then merge those results. While not exactly right, it can be statistically accurate based on the right choice of N. There is also parallel approximations for other things such as median using streaming algorithms. The goal is for Drill to be able to use these approximation algorithms in a processing tree for more queries. In the case that a user needs exact results, full shuffle/aggregations will still need to be done. They will still benefit from avoiding the various MapReduce barriers and requirements for persistence between stages. J On Thu, Apr 4, 2013 at 10:31 PM, devansh kumar wrote: > Hi, > > I understood what you wanted to say of using SUM and COUNT for calculating > AVERAGE. > But as i understand this will work very well with Distributed > operations..... what about operations like Median. > > Also i wanted to ask how the query will be broken up in > the execution engine. > I have gone through the Apache drill documentation and also Google Dremel > paper, and i am still confused that how multiple level of aggregation > will be created inside one tree. > > Thanks! > > > > ________________________________ > From: devansh kumar > To: Andrew Brust ; " > drill-user@incubator.apache.org" ; " > ted.dunning@gmail.com" > Sent: Friday, April 5, 2013 10:18 AM > Subject: Re: Basic queries regarding Apache Drill working > > > Hi, > > As Andrew asked, how will average work without an operation of Reduce > present. > Can you explain more on how will the data be aggregated? > > > > > ________________________________ > From: Andrew Brust > To: "drill-user@incubator.apache.org" ; > devansh kumar > Sent: Thursday, April 4, 2013 8:00 PM > Subject: RE: Basic queries regarding Apache Drill working > > Still not sure I follow (and pardon what must be a very rudimentary > misunderstanding on my part) how you get an average across a data set if > the data is split across nodes. With MapReduce, the reducer can get it > because all data for a given key is kept to one node. How would this work > with Drill? > > -----Original Message----- > From: Ted Dunning [mailto:ted.dunning@gmail.com] > Sent: Thursday, April 4, 2013 9:27 AM > To: drill-user@incubator.apache.org; devansh kumar > Subject: Re: Basic queries regarding Apache Drill working > > On Thu, Apr 4, 2013 at 12:27 PM, devansh kumar >wrote: > > > Hi, > > > > I am new and am > trying to understand how Apache Drill works but i > > have a few queries. > > Can anyone help me understand these things? > > > > 1. > > I am trying to understand if the execution engine is going to break up > > the data. > > > > Normally the data will already have been broken up across a cluster. > > > > What will happen if i am trying to an aggregation operation like > (AVERAGE). > > How will that work?? > > > > Yes. > > > > I have seen operations as SUM and COUNT. > > How will the Query execution tree look like in case of an AVERAGE > > > > It will look exactly like a SUM or COUNT except that two numbers will be > accumulated instead of one. > > > > 2. > > Does the Resource model is optimized when compared to MapReduce. > > > > Yes. This will happen because multiple levels of aggregation can be done > in one tree without the barrier between map and reduce > imposed by the MapReduce structure. > --bcaec54b45689cc9b204d99ed224--