From dev-return-39331-archive-asf-public=cust-asf.ponee.io@ignite.apache.org Tue Sep 18 23:10:02 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 866FB180672 for ; Tue, 18 Sep 2018 23:10:01 +0200 (CEST) Received: (qmail 46874 invoked by uid 500); 18 Sep 2018 21:10:00 -0000 Mailing-List: contact dev-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list dev@ignite.apache.org Received: (qmail 46860 invoked by uid 99); 18 Sep 2018 21:09:59 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Sep 2018 21:09:59 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 7B4521848FA for ; Tue, 18 Sep 2018 21:09:59 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.889 X-Spam-Level: * X-Spam-Status: No, score=1.889 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, T_DKIMWL_WL_MED=-0.01] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 4Bp-9lhY6TEY for ; Tue, 18 Sep 2018 21:09:56 +0000 (UTC) Received: from mail-ot1-f53.google.com (mail-ot1-f53.google.com [209.85.210.53]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 318FA5F3E3 for ; Tue, 18 Sep 2018 21:09:55 +0000 (UTC) Received: by mail-ot1-f53.google.com with SMTP id e18-v6so3496502oti.8 for ; Tue, 18 Sep 2018 14:09:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=RCJF+iu67vaVDla0mrdBAs5JA7mnMGA3dh/iO7LbCCg=; b=CMniwWRef1+OGutb3L+SIf66epK5qhyN8Fn0s07CYbIrIek/d0VXCn7EVLYiQsgrrG 0sNkVQ8MNTC/mrSC/3F4JH37hdbA2Hzze/j2qEG6vHskpsTBgzAwHAgm0HQWN2KoHwjb h/Q+gozNUEADMU4eF2f712hNZVGnpze5B065bQ4jmrJdi3NKfewLhjJ3QzauEe5m/Ux9 oOQn/ERX62FXIPBOYY3V10LFm/zrJaD0DbnVCzUnHhhvvS+kme4Z/2yHb6SozgKJUmm6 8lIw8WHpANpBTpBEYEFb6qUf4WHVuavNtEs6Gj8KGC9M3Cfno6Pqx1qNKTUzMQIwK657 Rc/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=RCJF+iu67vaVDla0mrdBAs5JA7mnMGA3dh/iO7LbCCg=; b=OSgsvGIg7s7/xsSSe4IKvhhFF6qlkW0c5T4WEH3BaTDFFwEUCxNHWrHzd6jcADxRna AphDNg7qGaMWULL5SSYQTwlav3v9Gg0wGWOEPSl+LcCvP1cZK7NHlKRmYhi5uEz6jAiP iF6n7ZG2nKyczxUGHghWal9qgO8P+hPvwLtqS/Ce0t78Fgi/kTbm7Ol4KKAs9CnMGD57 1Ccz6jGKY+2qItg/S9M70C+olE1Z6GHbwn6PyVpkF2HiRDTocNRNZA0ZJVb15/PLyMFy VqLAmuy6ZUNXIf4zk1mgWDqv+XxSZEZZv3ov6IExGlCaLiYJZt2yRo74D+iloy7Sykjf lDdw== X-Gm-Message-State: APzg51CpAUmXz8pSUQDr9QmHnmnmaBa6hr+vDnpWWmqtv7AIejhbEHWS 3ZPIHTar2QBsd9UaAHAJxGaz8qedSuL+DIGgk91MGg== X-Google-Smtp-Source: ANB0VdbL1Z0Gl9beDftHfS5OwlIjlynraLVKjV1y1lgOIGMn1wslQLdPr9eDw9xUCOECS6ia975VNYExmy9ZfUoo3ug= X-Received: by 2002:a9d:5129:: with SMTP id c38-v6mr16924134oth.383.1537304993555; Tue, 18 Sep 2018 14:09:53 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Dmitriy Pavlov Date: Wed, 19 Sep 2018 00:09:44 +0300 Message-ID: Subject: Re: Cache scan efficiency To: dev@ignite.apache.org Content-Type: multipart/alternative; boundary="0000000000000d6b2505762bb6a5" --0000000000000d6b2505762bb6a5 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi, I totally support the idea of cache preload. IMO it can be expanded. We can iterate over local partitions of the cache group and preload each. But it should be really clear documented methods so a user can be aware of the benefits of such method (e.g. if RAM region is big enough, etc). Sincerely, Dmitriy Pavlov =D0=B2=D1=82, 18 =D1=81=D0=B5=D0=BD=D1=82. 2018 =D0=B3. =D0=B2 21:36, Denis= Magda : > Folks, > > Since we're adding a method that would preload a certain partition, can w= e > add the one which will preload the whole cache? Ignite persistence users > I've been working with look puzzled once they realize there is no way to > warm up RAM after the restart. There are use cases that require this. > > Can the current optimizations be expanded to the cache preloading use cas= e? > > -- > Denis > > On Tue, Sep 18, 2018 at 3:58 AM Alexei Scherbakov < > alexey.scherbakoff@gmail.com> wrote: > > > Summing up, I suggest adding new public > > method IgniteCache.preloadPartition(partId). > > > > I will start preparing PR for IGNITE-8873 > > if no more > objections > > follow. > > > > > > > > =D0=B2=D1=82, 18 =D1=81=D0=B5=D0=BD=D1=82. 2018 =D0=B3. =D0=B2 10:50, A= lexey Goncharuk < > alexey.goncharuk@gmail.com > > >: > > > > > Dmitriy, > > > > > > In my understanding, the proper fix for the scan query looks like a b= ig > > > change and it is unlikely that we include it in Ignite 2.7. On the > other > > > hand, the method suggested by Alexei is quite simple and it definite= ly > > > fits Ignite 2.7, which will provide a better user experience. Even > > having a > > > proper scan query implemented this method can be useful in some > specific > > > scenarios, so we will not have to deprecate it. > > > > > > --AG > > > > > > =D0=BF=D0=BD, 17 =D1=81=D0=B5=D0=BD=D1=82. 2018 =D0=B3. =D0=B2 19:15,= Dmitriy Pavlov : > > > > > > > As I understood it is not a hack, it is an advanced feature for > warming > > > up > > > > the partition. We can build warm-up of the overall cache by calling > its > > > > partitions warm-up. Users often ask about this feature and are not > > > > confident with our lazy upload. > > > > > > > > Please correct me if I misunderstood the idea. > > > > > > > > =D0=BF=D0=BD, 17 =D1=81=D0=B5=D0=BD=D1=82. 2018 =D0=B3. =D0=B2 18:3= 7, Dmitriy Setrakyan < > dsetrakyan@apache.org > > >: > > > > > > > > > I would rather fix the scan than hack the scan. Is there any > > technical > > > > > reason for hacking it now instead of fixing it properly? Can some > of > > > the > > > > > experts in this thread provide an estimate of complexity and > > difference > > > > in > > > > > work that would be required for each approach? > > > > > > > > > > D. > > > > > > > > > > On Mon, Sep 17, 2018 at 4:42 PM Alexey Goncharuk < > > > > > alexey.goncharuk@gmail.com> > > > > > wrote: > > > > > > > > > > > I think it would be beneficial for some Ignite users if we adde= d > > > such a > > > > > > partition warmup method to the public API. The method should be > > > > > > well-documented and state that it may invalidate existing page > > cache. > > > > It > > > > > > will be a very effective instrument until we add the proper sca= n > > > > ability > > > > > > that Vladimir was referring to. > > > > > > > > > > > > =D0=BF=D0=BD, 17 =D1=81=D0=B5=D0=BD=D1=82. 2018 =D0=B3. =D0=B2 = 13:05, Maxim Muzafarov < > maxmuzaf@gmail.com > > >: > > > > > > > > > > > > > Folks, > > > > > > > > > > > > > > Such warming up can be an effective technique for performing > > > > > calculations > > > > > > > which required large cache > > > > > > > data reads, but I think it's the single narrow use case of al= l > > over > > > > > > Ignite > > > > > > > store usages. Like all other > > > > > > > powerfull techniques, we should use it wisely. In the general > > > case, I > > > > > > think > > > > > > > we should consider other > > > > > > > techniques mentioned by Vladimir and may create something lik= e > > > > `global > > > > > > > statistics of cache data usage` > > > > > > > to choose the best technique in each case. > > > > > > > > > > > > > > For instance, it's not obvious what would take longer: > > multi-block > > > > > reads > > > > > > or > > > > > > > 50 single-block reads issues > > > > > > > sequentially. It strongly depends on used hardware under the > hood > > > and > > > > > > might > > > > > > > depend on workload system > > > > > > > resources (CPU-intensive calculations and I\O access) as well= . > > But > > > > > > > `statistics` will help us to choose > > > > > > > the right way. > > > > > > > > > > > > > > > > > > > > > On Sun, 16 Sep 2018 at 23:59 Dmitriy Pavlov < > > dpavlov.spb@gmail.com > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Hi Alexei, > > > > > > > > > > > > > > > > I did not find any PRs associated with the ticket for check > > code > > > > > > changes > > > > > > > > behind this idea. Are there any PRs? > > > > > > > > > > > > > > > > If we create some forwards scan of pages, it should be a ve= ry > > > > > > > intellectual > > > > > > > > algorithm including a lot of parameters (how much RAM is > free, > > > how > > > > > > > probably > > > > > > > > we will need next page, etc). We had the private talk about > > such > > > > idea > > > > > > > some > > > > > > > > time ago. > > > > > > > > > > > > > > > > By my experience, Linux systems already do such forward > reading > > > of > > > > > file > > > > > > > > data (for corresponding sequential flagged file descriptors= ), > > but > > > > > some > > > > > > > > prefetching of data at the level of application may be usef= ul > > for > > > > > > > O_DIRECT > > > > > > > > file descriptors. > > > > > > > > > > > > > > > > And one more concern from me is about selecting a right pla= ce > > in > > > > the > > > > > > > system > > > > > > > > to do such prefetch. > > > > > > > > > > > > > > > > Sincerely, > > > > > > > > Dmitriy Pavlov > > > > > > > > > > > > > > > > =D0=B2=D1=81, 16 =D1=81=D0=B5=D0=BD=D1=82. 2018 =D0=B3. =D0= =B2 19:54, Vladimir Ozerov < > > > > vozerov@gridgain.com > > > > > >: > > > > > > > > > > > > > > > > > HI Alex, > > > > > > > > > > > > > > > > > > This is good that you observed speedup. But I do not thin= k > > this > > > > > > > solution > > > > > > > > > works for the product in general case. Amount of RAM is > > > limited, > > > > > and > > > > > > > > even a > > > > > > > > > single partition may need more space than RAM available. > > > Moving a > > > > > lot > > > > > > > of > > > > > > > > > pages to page memory for scan means that you evict a lot = of > > > other > > > > > > > pages, > > > > > > > > > what will ultimately lead to bad performance of subsequen= t > > > > queries > > > > > > and > > > > > > > > > defeat LRU algorithms, which are of great improtance for > good > > > > > > database > > > > > > > > > performance. > > > > > > > > > > > > > > > > > > Database vendors choose another approach - skip BTrees, > > iterate > > > > > > > direclty > > > > > > > > > over data pages, read them in multi-block fashion, use > > separate > > > > > scan > > > > > > > > buffer > > > > > > > > > to avoid excessive evictions of other hot pages. > > Corresponding > > > > > ticket > > > > > > > for > > > > > > > > > SQL exists [1], but idea is common for all parts of the > > system, > > > > > > > requiring > > > > > > > > > scans. > > > > > > > > > > > > > > > > > > As far as proposed solution, it might be good idea to add > > > special > > > > > API > > > > > > > to > > > > > > > > > "warmup" partition with clear explanation of pros (fast > scan > > > > after > > > > > > > > warmup) > > > > > > > > > and cons (slowdown of any other operations). But I think = we > > > > should > > > > > > not > > > > > > > > make > > > > > > > > > this approach part of normal scans. > > > > > > > > > > > > > > > > > > Vladimir. > > > > > > > > > > > > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-6057 > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sun, Sep 16, 2018 at 6:44 PM Alexei Scherbakov < > > > > > > > > > alexey.scherbakoff@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > Igniters, > > > > > > > > > > > > > > > > > > > > My use case involves scenario where it's necessary to > > iterate > > > > > over > > > > > > > > > > large(many TBs) persistent cache doing some calculation > on > > > read > > > > > > data. > > > > > > > > > > > > > > > > > > > > The basic solution is to iterate cache using ScanQuery. > > > > > > > > > > > > > > > > > > > > This turns out to be slow because iteration over cache > > > > involves a > > > > > > lot > > > > > > > > of > > > > > > > > > > random disk access for reading data pages referenced fr= om > > > leaf > > > > > > pages > > > > > > > by > > > > > > > > > > links. > > > > > > > > > > > > > > > > > > > > This is especially true when data is stored on disks wi= th > > > slow > > > > > > random > > > > > > > > > > access, like SAS disks. In my case on modern SAS disks > > array > > > > > > reading > > > > > > > > > speed > > > > > > > > > > was like several MB/sec while sequential read speed in > perf > > > > test > > > > > > was > > > > > > > > > about > > > > > > > > > > GB/sec. > > > > > > > > > > > > > > > > > > > > I was able to fix the issue by using ScanQuery with > > explicit > > > > > > > partition > > > > > > > > > set > > > > > > > > > > and running simple warmup code before each partition > scan. > > > > > > > > > > > > > > > > > > > > The code pins cold pages in memory in sequential order > thus > > > > > > > eliminating > > > > > > > > > > random disk access. Speedup was like x100 magnitude. > > > > > > > > > > > > > > > > > > > > I suggest adding the improvement to the product's core > by > > > > always > > > > > > > > > > sequentially preloading pages for all internal partitio= n > > > > > iterations > > > > > > > > > (cache > > > > > > > > > > iterators, scan queries, sql queries with scan plan) if > > > > partition > > > > > > is > > > > > > > > cold > > > > > > > > > > (low number of pinned pages). > > > > > > > > > > > > > > > > > > > > This also should speed up rebalancing from cold > partitions. > > > > > > > > > > > > > > > > > > > > Ignite JIRA ticket [1] > > > > > > > > > > > > > > > > > > > > Thoughts ? > > > > > > > > > > > > > > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-8873 > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > Alexei Scherbakov > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > -- > > > > > > > Maxim Muzafarov > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Best regards, > > Alexei Scherbakov > > > --0000000000000d6b2505762bb6a5--