drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques Nadeau <jacq...@apache.org>
Subject Re: Basic queries regarding Apache Drill working
Date Fri, 05 Apr 2013 15:35:09 GMT
Oops, meant to include a reference as an example of streaming algorithms:
https://github.com/clearspring/stream-lib



On Fri, Apr 5, 2013 at 8:34 AM, Jacques Nadeau <jacques@apache.org> wrote:

> The current thinking is that there will be an approximate query flag.
>  This will be useful in situations where parallel approximations can be
> made.  The simplest example is you want a top 10 group by attr1.  You can
> do a local top N group by attr1 and then merge those results.  While not
> exactly right, it can be statistically accurate based on the right choice
> of N.  There is also parallel approximations for other things such as
> median using streaming algorithms.  The goal is for Drill to be able to use
> these approximation algorithms in a processing tree for more queries.  In
> the case that a user needs exact results, full shuffle/aggregations will
> still need to be done.  They will still benefit from avoiding the various
> MapReduce barriers and requirements for persistence between stages.
>
> J
>
>
> On Thu, Apr 4, 2013 at 10:31 PM, devansh kumar <devansh_kumar@yahoo.com>wrote:
>
>> Hi,
>>
>> I understood what you wanted to say of using SUM and COUNT for
>> calculating AVERAGE.
>> But as i understand this will work very well with Distributed
>> operations..... what about operations like Median.
>>
>> Also i wanted to ask how the query will be broken up in
>> the execution engine.
>> I have gone through the Apache drill documentation and also Google Dremel
>> paper, and i am still confused that how multiple level of aggregation
>> will be created inside one tree.
>>
>> Thanks!
>>
>>
>>
>> ________________________________
>>  From: devansh kumar <devansh_kumar@yahoo.com>
>> To: Andrew Brust <andrew.brust@bluebadgeinsights.com>; "
>> drill-user@incubator.apache.org" <drill-user@incubator.apache.org>; "
>> ted.dunning@gmail.com" <ted.dunning@gmail.com>
>> Sent: Friday, April 5, 2013 10:18 AM
>> Subject: Re: Basic queries regarding Apache Drill working
>>
>>
>> Hi,
>>
>> As Andrew asked, how will average work without an operation of Reduce
>> present.
>> Can you explain more on how will the data be aggregated?
>>
>>
>>
>>
>> ________________________________
>>  From: Andrew Brust <andrew.brust@bluebadgeinsights.com>
>> To: "drill-user@incubator.apache.org" <drill-user@incubator.apache.org>;
>> devansh kumar <devansh_kumar@yahoo.com>
>> Sent: Thursday, April 4, 2013 8:00 PM
>> Subject: RE: Basic queries regarding Apache Drill working
>>
>> Still not sure I follow (and pardon what must be a very rudimentary
>> misunderstanding on my part) how you get an average across a data set if
>> the data is split across nodes.  With MapReduce, the reducer can get it
>> because all data for a given key is kept to one node.  How would this work
>> with Drill?
>>
>> -----Original Message-----
>> From: Ted Dunning [mailto:ted.dunning@gmail.com]
>> Sent: Thursday, April 4, 2013 9:27 AM
>> To: drill-user@incubator.apache.org; devansh kumar
>> Subject: Re: Basic queries regarding Apache Drill working
>>
>> On Thu, Apr 4, 2013 at 12:27 PM, devansh kumar <devansh_kumar@yahoo.com
>> >wrote:
>>
>> > Hi,
>> >
>> > I am new and am
>>  trying to understand how Apache Drill  works but i
>> > have a few queries.
>> > Can anyone help me understand these things?
>> >
>> > 1.
>> > I am trying to understand if the execution engine is going to break up
>> > the data.
>> >
>>
>> Normally the data will already have been broken up across a cluster.
>>
>>
>> > What will happen if i am trying to an aggregation operation like
>> (AVERAGE).
>> > How will that work??
>> >
>>
>> Yes.
>>
>>
>> > I have seen operations as SUM and COUNT.
>> > How will the Query execution tree look like in case of an AVERAGE
>> >
>>
>> It will look exactly like a SUM or COUNT except that two numbers will be
>> accumulated instead of one.
>>
>>
>> > 2.
>> > Does the Resource model is optimized when compared to MapReduce.
>> >
>>
>> Yes.  This will happen because multiple levels of aggregation can be done
>> in one tree without the barrier between map and reduce
>>  imposed by the MapReduce structure.
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message