drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Darshan Singh <darshan.m...@gmail.com>
Subject Re: Simple query on 150 billion records
Date Tue, 05 Apr 2016 13:18:27 GMT
Hi,

How much data  you got from this query

create table ANALYSIS_RESULT as (
select Int32Field1 as SECONDS from hdfs.`/data/` where Int32Field2=123456
or Int32Field2=4567898);

As per your email you said single record.Also, in this query you used
Int32Field1 as Seconds whereas in the first query it was just seconds.Are
these same fields or do you have some sort for conversion for these fields
in first query.

A plan would be grateful as well.

Thanks

On Mon, Apr 4, 2016 at 3:09 PM, François Méthot <fmethot78@gmail.com> wrote:

> Hi,
>
>   Querying 150 Billion records spread over ~21 000 parquets stored in hdfs
> on 13 nodes (6 cores each, Max Dir. Mem: 32GB, Max Heap 8 GB).
>
> Is their a known issue or drill limitation that would explain why the first
> query below can't return the expected single row and aggregation ?
>
> create table ANALYSIS_RESULT as (
> select to_date(to_timestamp((SECONDS)), count(1)
> from hdfs.`/data/
> where Int32Field2=123456 or Int32Field2=4567898
> group by to_date(to_timestamp((SECONDS)));
>
> After *20 hours*, SYSTEM ERROR: Foreman Exception: One more more nodes lost
> connectivity during query.
>
>
> If we do the query in 2 steps:
> create table ANALYSIS_RESULT as (
> select Int32Field1 as SECONDS from hdfs.`/data/` where Int32Field2=123456
> or Int32Field2=4567898);
>
> result was returned in *43 minutes* ( a single record ).
>
> select to_date(to_timestamp((SECONDS)), count(1)
> from ANALYSIS_RESULT
> group by to_date(to_timestamp((SECONDS));
>
> Aggregation of that single record is of course done in  < 1 second.
>    2016-04-04          1
>
>
>
> I also tried
> select to_date(to_timestamp((SECONDS)), count(1)  from (
> select Int32Field1 as SECONDS
> from hdfs.`/data/`
> where Int32Field2=123456 or Int32Field2=4567898)
> group by o_date(to_timestamp((SECONDS))
>
> Same thing: After *21 hours*, SYSTEM ERROR: Foreman Exception: One more
> more nodes lost connectivity during query.
>
>
> Thanks for your help
> Francois
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message