Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 24FCE160C05 for ; Wed, 3 Jan 2018 19:45:12 +0100 (CET) Received: (qmail 74385 invoked by uid 500); 3 Jan 2018 18:45:12 -0000 Mailing-List: contact user-help@kudu.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@kudu.apache.org Delivered-To: mailing list user@kudu.apache.org Received: (qmail 74374 invoked by uid 99); 3 Jan 2018 18:45:12 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Jan 2018 18:45:12 +0000 Received: from mail-wr0-f179.google.com (mail-wr0-f179.google.com [209.85.128.179]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 965091A0140 for ; Wed, 3 Jan 2018 18:45:11 +0000 (UTC) Received: by mail-wr0-f179.google.com with SMTP id p6so91864wrd.0 for ; Wed, 03 Jan 2018 10:45:11 -0800 (PST) X-Gm-Message-State: AKGB3mKNwArTcbknZeRvBpcWeqw+dBEPWxvGyuWA94D783Wi38U16TJQ DqVzc1YtWhmUPxP6nQtP8Ap/Y+/0q3NuZD6lmgg= X-Google-Smtp-Source: ACJfBotJmA1IG4yf6FwGQYaW24Wg9/MTkq13/54lBVJjfmS+Lel/Ok4EHUWkrRlBkJT18l5wDDALEeqEybb/A5KLRd4= X-Received: by 10.223.157.147 with SMTP id p19mr2309023wre.134.1515005110138; Wed, 03 Jan 2018 10:45:10 -0800 (PST) MIME-Version: 1.0 Received: by 10.223.134.109 with HTTP; Wed, 3 Jan 2018 10:45:09 -0800 (PST) In-Reply-To: References: From: Jean-Daniel Cryans Date: Wed, 3 Jan 2018 10:45:09 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: first and second run 2x query time difference To: user@kudu.apache.org Content-Type: multipart/alternative; boundary="f403043a2e206c534b0561e39d51" archived-at: Wed, 03 Jan 2018 18:45:14 -0000 --f403043a2e206c534b0561e39d51 Content-Type: text/plain; charset="UTF-8" Hey Boris, Thanks for reporting back with results! On Wed, Jan 3, 2018 at 10:38 AM, Boris Tyukin wrote: > so it was the page cache that makes this difference. we did a series of > tests either restarting Kudu only, Impala only or both and resetting or not > touching page cache. > > as for Kudu failures after restart, it was a sequence of services that > need to be started before Kudu. If we start Kudu after HDFS, everything is > fine. Data is intact > Is it possible that Kudu is sharing disks with ZK? > > thanks again for your help, J-D > > On Sat, Dec 16, 2017 at 4:05 PM, Jean-Daniel Cryans > wrote: > >> I'm more thinking in terms of the startup IO having some impact on the >> co-located services, but we really need to know what "went down" means. >> >> On Sat, Dec 16, 2017 at 12:50 PM, Boris Tyukin >> wrote: >> >>> yep it is really weird since Kudu does not use neither one. I'll get >>> with him on Monday to gather more details >>> >>> On Sat, Dec 16, 2017 at 3:28 PM, Jean-Daniel Cryans >> > wrote: >>> >>>> Hi Boris, >>>> >>>> How exactly did HDFS and ZK go down? A Kudu restart is fairly >>>> IO-intensive but I don't know how that can cause things like DataNodes to >>>> fail. >>>> >>>> J-D >>>> >>>> On Sat, Dec 16, 2017 at 11:45 AM, Boris Tyukin >>>> wrote: >>>> >>>>> well our admin had fun two days - it was the first time we restarted >>>>> Kudu on our DEV cluster and it did not go well. He is still troubleshooting >>>>> what happened but after Kudu restart zookeeper and HDFS went down after 3-4 >>>>> minutes. If we disable Kudu, all is well. No error in Kudu logs...I will >>>>> have more details next week so not asking for help as I do not know all the >>>>> details. What is obvious thought is that it has to do something with Kudu :) >>>>> >>>>> On Thu, Dec 14, 2017 at 9:40 AM, Boris Tyukin >>>>> wrote: >>>>> >>>>>> thanks for your suggestions, J-D, I am sure you are right more often >>>>>> than that! :)) >>>>>> >>>>>> I will report back with our results. So far I am really impressed >>>>>> with Kudu - we have been benchmarking ingest and egress throughput and our >>>>>> typical queries runtime. The biggest pain so far is lack of support for >>>>>> decimals >>>>>> >>>>>> On Wed, Dec 13, 2017 at 5:07 PM, Jean-Daniel Cryans < >>>>>> jdcryans@apache.org> wrote: >>>>>> >>>>>>> On Wed, Dec 13, 2017 at 11:30 AM, Boris Tyukin < >>>>>>> boris@boristyukin.com> wrote: >>>>>>> >>>>>>>> thanks J-D! we are going to try that and see how it impacts the >>>>>>>> runtime. >>>>>>>> >>>>>>>> is there any way to load this metadata upfront? a lot of our >>>>>>>> queries are adhoc in nature but they will be hitting the same tables with >>>>>>>> different predicates and join patterns though. >>>>>>>> >>>>>>> >>>>>>> You could use Impala to compute all the stats of all the tables >>>>>>> after each Kudu restart. Actually, do try that, restart Kudu then compute >>>>>>> stats and see how fast it scans. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> I am curious why this metadata does not survive restarts though. We >>>>>>>> are going to run our benchmarks again and this time restart Kudu and Impala. >>>>>>>> >>>>>>> >>>>>>> It's in the tserver memory, it can't survive a restart. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> I just ran another query first time which hits 2 large tables and >>>>>>>> these tables have been scanned by the previous query and this time I do not >>>>>>>> see any difference in query time before the first and second time - I guess >>>>>>>> this confirms your statement about " first time ever scanning the >>>>>>>> table since a Kudu restart" and collecting metadata. >>>>>>>> >>>>>>> >>>>>>> Maybe, I've been known to be right once or twice a year :) >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Dec 13, 2017 at 11:18 AM, Jean-Daniel Cryans < >>>>>>>> jdcryans@apache.org> wrote: >>>>>>>> >>>>>>>>> Hi Boris, >>>>>>>>> >>>>>>>>> Given that we don't have much data we can use here, I'll have to >>>>>>>>> extrapolate. As an aside though, this is yet another example where we need >>>>>>>>> more Kudu-side metrics in the query profile. >>>>>>>>> >>>>>>>>> So, Kudu lazily loads a bunch of metadata and that can really >>>>>>>>> affect scan times. If this was your first time ever scanning the table >>>>>>>>> since a Kudu restart, it's very possible that that's where that time was >>>>>>>>> spent. There's also the page cache in the OS that might now be populated. >>>>>>>>> You could do something like "sync; echo 3 > /proc/sys/vm/drop_caches" on >>>>>>>>> all the machines and run the query 2 times again, without restarting Kudu, >>>>>>>>> to understand the effect of the page cache itself. There's currently now >>>>>>>>> way to purge the cached metadata in Kudu though. >>>>>>>>> >>>>>>>>> Hope this helps a bit, >>>>>>>>> >>>>>>>>> J-D >>>>>>>>> >>>>>>>>> On Wed, Dec 13, 2017 at 8:07 AM, Boris Tyukin < >>>>>>>>> boris@boristyukin.com> wrote: >>>>>>>>> >>>>>>>>>> Hi guys, >>>>>>>>>> >>>>>>>>>> I am doing some benchmarks with Kudu and Impala/Parquet and hope >>>>>>>>>> to share it soon but there is one thing that bugs me. This is perhaps >>>>>>>>>> Impala question but since I am using Kudu with Impala I am going to try and >>>>>>>>>> ask anyway. >>>>>>>>>> >>>>>>>>>> One of my queries takes 120 seconds to run the very first time. >>>>>>>>>> It joins one large 5B row table with a bunch of smaller tables and then >>>>>>>>>> stores result in Impala/parquet (not Kudu). >>>>>>>>>> >>>>>>>>>> Now if I run it second and third time, it only takes 60 seconds. >>>>>>>>>> Can someone explain why? Is there any settings to decrease this gap? >>>>>>>>>> >>>>>>>>>> I've compared query profiles in CM and the only thing that was >>>>>>>>>> very different is scan against Kudu table (the large one): >>>>>>>>>> >>>>>>>>>> *************************** >>>>>>>>>> first time: >>>>>>>>>> *************************** >>>>>>>>>> KUDU_SCAN_NODE (id=0) (47.68s) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> - BytesRead: *0 B* >>>>>>>>>> - InactiveTotalTime: *0ns* >>>>>>>>>> - KuduRemoteScanTokens: *0* >>>>>>>>>> - NumScannerThreadsStarted: *20* >>>>>>>>>> - PeakMemoryUsage: *35.8 MiB* >>>>>>>>>> - RowsRead: *693,502,241* >>>>>>>>>> - RowsReturned: *693,502,241* >>>>>>>>>> - RowsReturnedRate: *14643448 per second* >>>>>>>>>> - ScanRangesComplete: *20* >>>>>>>>>> - ScannerThreadsInvoluntaryContextSwitches: *1,341* >>>>>>>>>> - ScannerThreadsTotalWallClockTime: *36.2m* >>>>>>>>>> - MaterializeTupleTime(*): *47.57s* >>>>>>>>>> - ScannerThreadsSysTime: *31.42s* >>>>>>>>>> - ScannerThreadsUserTime: *1.7m* >>>>>>>>>> - ScannerThreadsVoluntaryContextSwitches: *96,855* >>>>>>>>>> - TotalKuduScanRoundTrips: *52,308* >>>>>>>>>> - TotalReadThroughput: *0 B/s* >>>>>>>>>> - TotalTime: *47.68s* >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> *************************** >>>>>>>>>> second time: >>>>>>>>>> *************************** >>>>>>>>>> KUDU_SCAN_NODE (id=0) (4.28s) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> - BytesRead: *0 B* >>>>>>>>>> - InactiveTotalTime: *0ns* >>>>>>>>>> - KuduRemoteScanTokens: *0* >>>>>>>>>> - NumScannerThreadsStarted: *20* >>>>>>>>>> - PeakMemoryUsage: *37.9 MiB* >>>>>>>>>> - RowsRead: *693,502,241* >>>>>>>>>> - RowsReturned: *693,502,241* >>>>>>>>>> - RowsReturnedRate: *173481534 per second* >>>>>>>>>> - ScanRangesComplete: *20* >>>>>>>>>> - ScannerThreadsInvoluntaryContextSwitches: *1,451* >>>>>>>>>> - ScannerThreadsTotalWallClockTime: *19.5m* >>>>>>>>>> - MaterializeTupleTime(*): *4.20s* >>>>>>>>>> - ScannerThreadsSysTime: *38.22s* >>>>>>>>>> - ScannerThreadsUserTime: *1.7m* >>>>>>>>>> - ScannerThreadsVoluntaryContextSwitches: *480,870* >>>>>>>>>> - TotalKuduScanRoundTrips: *52,142* >>>>>>>>>> - TotalReadThroughput: *0 B/s* >>>>>>>>>> - TotalTime: *4.28s* >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > --f403043a2e206c534b0561e39d51 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hey Boris,

Thanks for reporting back wi= th results!

= On Wed, Jan 3, 2018 at 10:38 AM, Boris Tyukin <boris@boristyukin.com> wrote:
so = it was the page cache that makes this difference. we did a series of tests = either restarting Kudu only, Impala only or both and resetting or not touch= ing page cache.

as for Kudu failures after restart, it w= as a sequence of services that need to be started before Kudu. If we start = Kudu after HDFS, everything is fine. Data is intact

Is it possible that Kudu is sharing disks with ZK?
=C2=A0
thanks again for your help, J-D

On Sat, Dec 16, 2017 at 4:05 PM, Jean-Daniel Cryans &l= t;jdcryans@apache.= org> wrote:
I'm more thinking in terms of the startup IO having some impact on th= e co-located services, but we really need to know what "went down"= ; means.

On Sat, Dec 16, 2017 at 12:50 PM, Boris Tyukin <boris@boristyukin.= com> wrote:
yep it is really weird since Kudu does not use neither one. I'll get = with him on Monday to gather more details

On Sat, Dec 16, 2017 at 3:28 PM, Jean-Daniel Cryans <jdcryans@apac= he.org> wrote:
Hi Boris,

How exactly did HDFS and ZK go down? A Ku= du restart is fairly IO-intensive but I don't know how that can cause t= hings like DataNodes to fail.

J-D

On Sat, Dec 16, 2017 at 11= :45 AM, Boris Tyukin <boris@boristyukin.com> wrote:
<= blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px= #ccc solid;padding-left:1ex">
well our admin had fun two d= ays - it was the first time we restarted Kudu on our DEV cluster and it did= not go well. He is still troubleshooting what happened but after Kudu rest= art zookeeper and HDFS went down after 3-4 minutes. If we disable Kudu, all= is well. No error in Kudu logs...I will have more details next week so not= asking for help as I do not know all the details. What is obvious thought = is that it has to do something with Kudu :)

On Thu, Dec 14, 2017 at 9:40 AM, Boris Tyukin <bor= is@boristyukin.com> wrote:
=
thanks for your suggestions, J-D, I am sure you are right = more often than that! :))

I will report back with our re= sults. So far I am really impressed with Kudu - we have been benchmarking i= ngest and egress throughput and our typical queries runtime. The biggest pa= in so far is lack of support for decimals

On Wed, De= c 13, 2017 at 5:07 PM, Jean-Daniel Cryans <jdcryans@apache.org> wrote:
On Wed, Dec 13, 2017 at 1= 1:30 AM, Boris Tyukin <boris@boristyukin.com> wrote:
=
thanks=C2=A0J-D! we are going to try that and see how it impacts the = runtime.=C2=A0

is there any way to load this metada= ta upfront? a lot of our queries are adhoc=C2=A0in nature but they will be = hitting the same tables with different predicates and join patterns though.= =C2=A0

You could = use Impala to compute all the stats of all the tables after each Kudu resta= rt. Actually, do try that, restart Kudu then compute stats and see how fast= it scans.
=C2=A0

I am curious why this metadata does not survi= ve restarts though. We are going to run our benchmarks again and this time = restart Kudu and Impala.

It's in the tserver memory, it can't survive a restart.
=C2=A0

I just ran another query first time which hits 2 large tabl= es and these tables have been scanned by the previous query and this time I= do not see any difference in query time before the first and second time -= I guess this confirms your statement about "=C2=A0first time eve= r scanning the table since a Kudu restart" and collecting metadata.

Maybe, I've bee= n known to be right once or twice a year :)
=C2=A0


On Wed, Dec 13, 2017 at 11:18 AM, Jean-Daniel Cryans <jdcryans@ap= ache.org> wrote:
Hi Boris,

Given that we don't have much dat= a we can use here, I'll have to extrapolate. As an aside though, this i= s yet another example where we need more Kudu-side metrics in the query pro= file.

So, Kudu lazily loads a bunch of metadata an= d that can really affect scan times. If this was your first time ever scann= ing the table since a Kudu restart, it's very possible that that's = where that time was spent. There's also the page cache in the OS that m= ight now be populated. You could do something like "sync; echo 3 > = /proc/sys/vm/drop_caches" on all the machines and run the query 2 time= s again, without restarting Kudu, to understand the effect of the page cach= e itself. There's currently now way to purge the cached metadata in Kud= u though.

Hope this helps a bit,

J-D

On Wed, Dec 13, 2017 at 8:07 AM, Boris Tyukin = <boris@borist= yukin.com> wrote:
Hi guys,

I am doing some benchmarks with Kudu a= nd Impala/Parquet and hope to share it soon but there is one thing that bug= s me. This is perhaps Impala question but since I am using Kudu with Impala= I am going to try and ask anyway.

One of my queri= es takes 120 seconds to run the very first time. It joins one large 5B row = table with a bunch of smaller tables and then stores result in Impala/parqu= et (not Kudu).=C2=A0

Now if I run it second and th= ird time, it only takes 60 seconds. Can someone explain why? Is there any s= ettings to decrease this gap?

I've compared qu= ery profiles in CM and the only thing that was very different is scan again= st Kudu table (the large one):

*******************= ********
first time:
***************************
<= div>

KU= DU_SCAN_NODE (id=3D0) (47.68s)

    • BytesRead:= =C2=A00 B
    • InactiveTotal= Time:=C2=A00ns
    • KuduRemo= teScanTokens:=C2=A00
    • Nu= mScannerThreadsStarted:=C2=A020
    • PeakMemoryUsage:=C2=A035.8 MiB
    • RowsRead:=C2=A0693,502,241
    • RowsReturned:=C2=A0693,502,241
    • <= li style=3D"line-height:20px">RowsReturnedRate:=C2=A014643448 per s= econd
    • ScanRangesComplete:=C2=A0= 20
    • ScannerThreadsInvolu= ntaryContextSwitches:=C2=A01,341
    • ScannerThreadsTotalWallClockTime:=C2=A036.2m
      • MaterializeTupleTime(*):=C2=A047.57s
      • ScannerThreadsSy= sTime:=C2=A031.42s
      • Scan= nerThreadsUserTime:=C2=A01.7m
    • ScannerThreadsVoluntaryContextSwitches:=C2=A0= 96,855
    • TotalKuduScanRoundTrips:= =C2=A052,308
    • Total= ReadThroughput:=C2=A00 B/s
    • TotalTime:=C2=A047.68s

    =
    ***************************
    second time:
    ****= ***********************

    KUDU_SCAN_NODE (id=3D0) (4.28s)

      • BytesRead:=C2= =A00 B
      • InactiveTotalTim= e:=C2=A00ns
      • KuduRemoteS= canTokens:=C2=A00
      • NumSc= annerThreadsStarted:=C2=A020
      • PeakMemoryUsage:=C2=A037.9 MiB
      • RowsRead:=C2=A0693,502,241
      • RowsReturned:=C2=A0693,502,241
      • RowsReturnedRate:=C2=A0173481534 per seco= nd
      • ScanRangesComplete:=C2=A020
      • ScannerThreadsInvolunta= ryContextSwitches:=C2=A01,451
      • ScannerThreadsTotalWallClockTime:=C2=A019.5m
        • MaterializeTupleTime(*):=C2=A0= 4.20s
        • ScannerThreadsSysTime:=C2= =A038.22s
        • ScannerThread= sUserTime:=C2=A01.7m
      • ScannerThreadsVoluntaryContextSwitches:=C2=A0480,870
      • TotalKuduScanRoundTrips:=C2=A052,142
      • TotalReadThro= ughput:=C2=A00 B/s
      • Tota= lTime:=C2=A04.28s













      --f403043a2e206c534b0561e39d51--