Return-Path: X-Original-To: apmail-drill-dev-archive@www.apache.org Delivered-To: apmail-drill-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7DD1919D01 for ; Tue, 5 Apr 2016 13:18:31 +0000 (UTC) Received: (qmail 19485 invoked by uid 500); 5 Apr 2016 13:18:31 -0000 Delivered-To: apmail-drill-dev-archive@drill.apache.org Received: (qmail 19436 invoked by uid 500); 5 Apr 2016 13:18:31 -0000 Mailing-List: contact dev-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list dev@drill.apache.org Received: (qmail 19419 invoked by uid 99); 5 Apr 2016 13:18:30 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Apr 2016 13:18:30 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 5FF40180361 for ; Tue, 5 Apr 2016 13:18:30 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.179 X-Spam-Level: * X-Spam-Status: No, score=1.179 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 0kuO-VGmcOEz for ; Tue, 5 Apr 2016 13:18:28 +0000 (UTC) Received: from mail-wm0-f41.google.com (mail-wm0-f41.google.com [74.125.82.41]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id BA7AF5F47D for ; Tue, 5 Apr 2016 13:18:27 +0000 (UTC) Received: by mail-wm0-f41.google.com with SMTP id f198so31812606wme.0 for ; Tue, 05 Apr 2016 06:18:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to; bh=uLLNFjF2rY1Tom804WUtI1gdYGqqJCDlvHedBt7uj+c=; b=OL/Fg+Q+0pcEM+xAkiO2FXJ0M8IjAVa8NJxdjm+ImPMuHmxV2j3QsgSxl2iGS5FXvl K2koJHM9MyjqAObuheilLxZKYRnOxjMrG7sAOj7RJQ7UM0AALiv24T0PklsE+idjz24M 80RjzzMOkkeZYn0Juz4sNrMHdgpuLC1HXmH8w35juaoVEKH5Uk6hVqFofxCjHKsRNTa1 3WfTg3yhjY6GQWXl82qOrzb2zoWtF8GeOCn4fiIPrCI4Iw9uV8G6PKYrcfPpqRRaOYjN DnudGB37BDsH0WusYUnXdBmv3+20a84zs2mfQwqHvZfw+MbQ/uxE6UR70JDeR4RftGKV 4yhA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to; bh=uLLNFjF2rY1Tom804WUtI1gdYGqqJCDlvHedBt7uj+c=; b=HP+aT9QK/k2ttFsd/8DuT7XpptL4s8eq0NTS875zVAHT1Qcmw17qvJWz/g+igSXHEQ v+lbTALyGOF2BgjH1Vq/ew9zgWgza+PetxAMCMWdPuVTWb2YwaWRmEGAMm3MDlKC71wi EVyo5GJUonTgjiIV3AkcpN5u6Qi/Lh2pK2767DyY7snNDSFwAIr85XUh2aRbzG2AOR3B dzeF+g9O6W3hL1EM028AN7avWzyCmtq7//Zi+/blH4Kw54NgFXg+oj2lGelfcDQk0oes vd/gDd7uATE50WPt9m5YRv2iQFXLZb1zpnbGhCn36HOtPEF27mp9grSHugTYK3sO7hCc JeHA== X-Gm-Message-State: AD7BkJLJ2KD76eUrdluTJ6vCVsyhK6ZQK73P7QFDBGZ8fKEGaJZWgictVFu0rU2nq06pL+khbdImK8wSpOImLQ== MIME-Version: 1.0 X-Received: by 10.194.92.107 with SMTP id cl11mr30527946wjb.21.1459862307348; Tue, 05 Apr 2016 06:18:27 -0700 (PDT) Received: by 10.194.112.129 with HTTP; Tue, 5 Apr 2016 06:18:27 -0700 (PDT) In-Reply-To: References: Date: Tue, 5 Apr 2016 14:18:27 +0100 Message-ID: Subject: Re: Simple query on 150 billion records From: Darshan Singh To: dev@drill.apache.org Content-Type: multipart/alternative; boundary=047d7bf0d2d44026dc052fbcaefa --047d7bf0d2d44026dc052fbcaefa Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi, How much data you got from this query create table ANALYSIS_RESULT as ( select Int32Field1 as SECONDS from hdfs.`/data/` where Int32Field2=3D123456 or Int32Field2=3D4567898); As per your email you said single record.Also, in this query you used Int32Field1 as Seconds whereas in the first query it was just seconds.Are these same fields or do you have some sort for conversion for these fields in first query. A plan would be grateful as well. Thanks On Mon, Apr 4, 2016 at 3:09 PM, Fran=C3=A7ois M=C3=A9thot wrote: > Hi, > > Querying 150 Billion records spread over ~21 000 parquets stored in hdf= s > on 13 nodes (6 cores each, Max Dir. Mem: 32GB, Max Heap 8 GB). > > Is their a known issue or drill limitation that would explain why the fir= st > query below can't return the expected single row and aggregation ? > > create table ANALYSIS_RESULT as ( > select to_date(to_timestamp((SECONDS)), count(1) > from hdfs.`/data/ > where Int32Field2=3D123456 or Int32Field2=3D4567898 > group by to_date(to_timestamp((SECONDS))); > > After *20 hours*, SYSTEM ERROR: Foreman Exception: One more more nodes lo= st > connectivity during query. > > > If we do the query in 2 steps: > create table ANALYSIS_RESULT as ( > select Int32Field1 as SECONDS from hdfs.`/data/` where Int32Field2=3D1234= 56 > or Int32Field2=3D4567898); > > result was returned in *43 minutes* ( a single record ). > > select to_date(to_timestamp((SECONDS)), count(1) > from ANALYSIS_RESULT > group by to_date(to_timestamp((SECONDS)); > > Aggregation of that single record is of course done in < 1 second. > 2016-04-04 1 > > > > I also tried > select to_date(to_timestamp((SECONDS)), count(1) from ( > select Int32Field1 as SECONDS > from hdfs.`/data/` > where Int32Field2=3D123456 or Int32Field2=3D4567898) > group by o_date(to_timestamp((SECONDS)) > > Same thing: After *21 hours*, SYSTEM ERROR: Foreman Exception: One more > more nodes lost connectivity during query. > > > Thanks for your help > Francois > --047d7bf0d2d44026dc052fbcaefa--