Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 4AC5A200BE9 for ; Mon, 12 Dec 2016 06:51:40 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 494F9160B2C; Mon, 12 Dec 2016 05:51:40 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6C28B160B20 for ; Mon, 12 Dec 2016 06:51:39 +0100 (CET) Received: (qmail 9753 invoked by uid 500); 12 Dec 2016 05:51:38 -0000 Mailing-List: contact user-help@kudu.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@kudu.apache.org Delivered-To: mailing list user@kudu.apache.org Received: (qmail 9745 invoked by uid 99); 12 Dec 2016 05:51:38 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Dec 2016 05:51:38 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id E3FDE1803A7 for ; Mon, 12 Dec 2016 05:51:37 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.979 X-Spam-Level: * X-Spam-Status: No, score=1.979 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=cloudera-com.20150623.gappssmtp.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id bHnb0WJcciMP for ; Mon, 12 Dec 2016 05:51:36 +0000 (UTC) Received: from mail-wm0-f42.google.com (mail-wm0-f42.google.com [74.125.82.42]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 2B54B5F36A for ; Mon, 12 Dec 2016 05:51:36 +0000 (UTC) Received: by mail-wm0-f42.google.com with SMTP id c184so10699936wmd.0 for ; Sun, 11 Dec 2016 21:51:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudera-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=ryM3M/TYG3UeqGID7Lc/GKKtp15X7Rz0ywAELsd3OWU=; b=NCHK5UGDC69+nAnfRJb/if14DmxApOmfF/SiKXShCTp4mI4Pfhd4yxmaxmPLWXcX1/ Pd2220zXIAiJTNeQpt+FEZDagG6jGlYdCFRBlLZzANEuvIECuqpD2vxTBb2McqHVmHqQ Daf7jli1+zJJ5HgeWyh+JtM60HvGOReowRHQEvGTwBjPlIIhlBWMZr/SQ2CDEyCOdKse p5WObVzkYhDE1Y9vXR4Wp+y5hFck+DSvXd0i2Sdhi4shO/WX0gF/3laCREKYmqvinuII y1i6oIF3nOlW34p64fFi0tMtf7XPUFXECzQfjbJ1Lxtffy0Sl6iir58s+1Nk7Cau0M0C +Fag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=ryM3M/TYG3UeqGID7Lc/GKKtp15X7Rz0ywAELsd3OWU=; b=TUK+DxxVmzmjykJ5N8754OItSzUccnRMWJDemA/Ri/+xBTS3kcu2sMMbphoKX4ce2/ 5wDttLc8c+nKRAqypwm+aMf1nCi9dBiVY6xPjgsnRmiAbIgCLBDW76qicyuAbEU6iS9i hOpGSDtgJqQFHjEd+kEh7EwSgE9T6kNMQ5vWZiy/C/tYjRUronA+UWcA21npwX+ojNVe BwyKp61cm0tEuquRu4mIerhVLVwjURaumb1oqvLz+rBPLZ+kopkH8cmwD+BIE0CbMXSX j2zxQPuANdMuO1gar/JOZEeQz4JOMHw5XanYDo7X6FiDPOCz/kJWuCmkN/AUw8hczlfN lFxA== X-Gm-Message-State: AKaTC03caNLKWVOe/cKjv2Q0oNuR00Au2434dTPmTMEUgglN1iEw7U2IRTtgTmcSz/Qjc3W813vDC/AKX3CvB70L X-Received: by 10.28.158.147 with SMTP id h141mr16808581wme.59.1481521895287; Sun, 11 Dec 2016 21:51:35 -0800 (PST) MIME-Version: 1.0 Received: by 10.28.135.68 with HTTP; Sun, 11 Dec 2016 21:51:14 -0800 (PST) In-Reply-To: References: From: Todd Lipcon Date: Mon, 12 Dec 2016 12:51:14 +0700 Message-ID: Subject: Re: performance issue involving "insert as select" To: user@kudu.apache.org Content-Type: multipart/alternative; boundary=001a114b39844bcf5105436fb22c archived-at: Mon, 12 Dec 2016 05:51:40 -0000 --001a114b39844bcf5105436fb22c Content-Type: text/plain; charset=UTF-8 Hi Rotem, On Thu, Dec 8, 2016 at 3:25 AM, Rotem Gabay wrote: > Hi, I have a small cluster on which I tried to run some performance tests > on kudu, In order to populate some data I have made simple "insert as > select" from simple HDFS table that took 10 minutes to finish. I then tried > to duplicate the same data by doing another insert as select from the kudu > table to itself ( insert into kudu_tbl select * from kudu_tbl), this insert > took more then 2 hours to complete. Is there ant reasonable explaination ? > One interesting aspect of current releases of Kudu is that Impala queries don't operate with snapshot consistency. In the case that you are writing into the same table that you are reading from, it's actually possible that the query reads its own results. Put another way, one fragment of the query may be writing into a tablet while another fragment is still reading that tablet. Without snapshot consistency, it's actually possible for this to create a sort of "infinite loop" of inserts. While usually not infinite, it can end up producing far more rows than you expected. We're working on addressing this in upcoming releases. In the meantime, it's probably best to generate your data in a different fashion rather than inserting into the same table that you're reading from. Hope that helps. Let us know if the explanation doesn't seem to match up with what you're seeing. -Todd -- Todd Lipcon Software Engineer, Cloudera --001a114b39844bcf5105436fb22c Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Rotem,

On Thu, Dec 8, 2016 at 3:25 AM, Rotem Gabay <rotemgabay= 82@gmail.com> wrote:
Hi, I have =C2=A0a small cluster on which I tried to run some pe= rformance tests on kudu, In order to populate some data I have made simple = "insert as select" from simple HDFS table that took 10 minutes to= finish. I then tried to duplicate the same data by doing another insert as= select from the kudu table to itself ( insert into kudu_tbl select * from = kudu_tbl), this insert took more then 2 hours to complete. Is there ant rea= sonable explaination ?

One intere= sting aspect of current releases of Kudu is that Impala queries don't o= perate with snapshot consistency. In the case that you are writing into the= same table that you are reading from, it's actually possible that the = query reads its own results.

Put another way, one = fragment of the query may be writing into a tablet while another fragment i= s still reading that tablet. Without snapshot consistency, it's actuall= y possible for this to create a sort of "infinite loop" of insert= s. While usually not infinite, it can end up producing far more rows than y= ou expected.

We're working on addressing this = in upcoming releases. In the meantime, it's probably best to generate y= our data in a different fashion rather than inserting into the same table t= hat you're reading from.

Hope that helps. Let = us know if the explanation doesn't seem to match up with what you'r= e seeing.

-Todd
--
Todd Lipcon
Software E= ngineer, Cloudera
--001a114b39844bcf5105436fb22c--