drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sudheesh Katkam <skat...@maprtech.com>
Subject Re: Probabilistic data structures in Drill
Date Mon, 02 May 2016 17:13:51 GMT
There is a pending pull request [1] to support table statistics. This includes using HyperLogLog
to estimate number of distinct values, etc. I do not know further details.

Thank you,
Sudheesh

[1] https://github.com/apache/drill/pull/425 <https://github.com/apache/drill/pull/425>

> On May 1, 2016, at 7:26 PM, Edmon Begoli <ebegoli@gmail.com> wrote:
> 
> Yes, I am preparing a research seminar, and I am doing a survey of the uses
> or probabilistic and synopsis data structures in post-Hadoop "Big Data"
> technologies.
> 
> On Sun, May 1, 2016 at 8:34 PM, Julian Hyde <jhyde@apache.org> wrote:
> 
>> Drill also makes use of hash tables and hash partitioning.
>> 
>> I’m not sure what was the purpose of your question. Are you carrying out a
>> survey?
>> 
>> Julian
>> 
>> 
>>> On May 1, 2016, at 5:22 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
>>> 
>>> Drill doesn't use any such data structures in itself. The emphasis has
>> been
>>> on being correct first with the option of introducing approximations
>> later.
>>> 
>>> That said, you can definitely define aggregators yourself. Last I
>> checked,
>>> however, user defined aggregators are single level ... that means that
>>> everything that gets aggregated has to go through a single function which
>>> definitely limits scalability. This was several months ago, though, so
>>> things may have improved by now.
>>> 
>>> Perhaps somebody can comment on whether multi-level user-defined
>>> aggregators are possible?
>>> 
>>> 
>>> 
>>> On Sat, Apr 30, 2016 at 8:32 AM, Edmon Begoli <ebegoli@gmail.com> wrote:
>>> 
>>>> Is Drill using any of the probabilistic data structures [1], and if so -
>>>> which ones and how?
>>>> 
>>>> Thank you,
>>>> Edmon
>>>> 
>>>> 1. Probabilistic Data Structures -
>>>> https://en.m.wikipedia.org/wiki/Category:Probabilistic_data_structures
>>>> 
>> 
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message