incubator-drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From devansh kumar <>
Subject Re: Basic queries regarding Apache Drill working
Date Fri, 05 Apr 2013 04:48:59 GMT

As Andrew asked, how will average work without an operation of Reduce present. 
Can you explain more on how will the data be aggregated?

 From: Andrew Brust <>
To: "" <>; devansh kumar
Sent: Thursday, April 4, 2013 8:00 PM
Subject: RE: Basic queries regarding Apache Drill working
Still not sure I follow (and pardon what must be a very rudimentary misunderstanding on my
part) how you get an average across a data set if the data is split across nodes.  With MapReduce,
the reducer can get it because all data for a given key is kept to one node.  How would this
work with Drill?

-----Original Message-----
From: Ted Dunning [] 
Sent: Thursday, April 4, 2013 9:27 AM
To:; devansh kumar
Subject: Re: Basic queries regarding Apache Drill working

On Thu, Apr 4, 2013 at 12:27 PM, devansh kumar <>wrote:

> Hi,
> I am new and am trying to understand how Apache Drill  works but i 
> have a few queries.
> Can anyone help me understand these things?
> 1.
> I am trying to understand if the execution engine is going to break up 
> the data.

Normally the data will already have been broken up across a cluster.

> What will happen if i am trying to an aggregation operation like (AVERAGE).
> How will that work??


> I have seen operations as SUM and COUNT.
> How will the Query execution tree look like in case of an AVERAGE

It will look exactly like a SUM or COUNT except that two numbers will be accumulated instead
of one.

> 2.
> Does the Resource model is optimized when compared to MapReduce.

Yes.  This will happen because multiple levels of aggregation can be done in one tree without
the barrier between map and reduce imposed by the MapReduce structure.
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message