Return-Path: X-Original-To: apmail-asterixdb-dev-archive@minotaur.apache.org Delivered-To: apmail-asterixdb-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DEE09182D9 for ; Tue, 11 Aug 2015 20:13:04 +0000 (UTC) Received: (qmail 23520 invoked by uid 500); 11 Aug 2015 20:12:55 -0000 Delivered-To: apmail-asterixdb-dev-archive@asterixdb.apache.org Received: (qmail 23460 invoked by uid 500); 11 Aug 2015 20:12:55 -0000 Mailing-List: contact dev-help@asterixdb.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@asterixdb.incubator.apache.org Delivered-To: mailing list dev@asterixdb.incubator.apache.org Received: (qmail 23448 invoked by uid 99); 11 Aug 2015 20:12:55 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Aug 2015 20:12:55 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id A3FC018193D for ; Tue, 11 Aug 2015 20:12:54 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.88 X-Spam-Level: ** X-Spam-Status: No, score=2.88 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id Y0NeAdwKrHbD for ; Tue, 11 Aug 2015 20:12:47 +0000 (UTC) Received: from mail-yk0-f180.google.com (mail-yk0-f180.google.com [209.85.160.180]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 5E43B42B7D for ; Tue, 11 Aug 2015 20:12:47 +0000 (UTC) Received: by ykdz80 with SMTP id z80so65407819ykd.2 for ; Tue, 11 Aug 2015 13:12:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=ppduGdCWlqZeXPs32DDvO0g9RAs6QCJGMoAQdL6Z2uM=; b=TDBj6EjFfMorW/y2hHZRTzQVEBqMJT7GpEQuxms8/Yq02WBn0TJZ2l2wiQj2SiUKvh 3JzXhA/yFOgCNZXRp+hOL55BWgZKZ7SmHjAkqWEsWX25w7vWFErpChQfWp3HSVjx0wB8 jlI7DZRCOKc/vnTN4zRqy1A4po1pM4P5j5Tt4dhQ8s1UIuZVhG5EEwFVIwI4CsLuvzuc nv1+9lBalTZtBLweu/RYGQ2cK+XXr22F9A/ocBwp5Wx0QJM0b7qDeMkpLt0MOnmexbCy dYbqiVF+qkbxG+v21FvN9SfchbcJpAuC517qHsM5oaXR/lPu0NR9qPLe6ZtbEYuMjEQY 3KgA== MIME-Version: 1.0 X-Received: by 10.170.36.66 with SMTP id 63mr23227319yke.31.1439323960895; Tue, 11 Aug 2015 13:12:40 -0700 (PDT) Received: by 10.37.203.209 with HTTP; Tue, 11 Aug 2015 13:12:40 -0700 (PDT) In-Reply-To: <20150811195106.5A80A240F4A@unhygienix.ics.uci.edu> References: <20150811195106.5A80A240F4A@unhygienix.ics.uci.edu> Date: Tue, 11 Aug 2015 13:12:40 -0700 Message-ID: Subject: Re: Change in hyracks[master]: Make LSM bulkload append-only and write-once. From: Yingyi Bu To: dev@asterixdb.incubator.apache.org, Young-Seok Kim Cc: Ian Maxon , Jenkins , Murtadha Hubail Content-Type: multipart/alternative; boundary=001a11378a40680236051d0eb917 --001a11378a40680236051d0eb917 Content-Type: text/plain; charset=UTF-8 I just came across this change. With this change, does the old B-Tree (non-LSM) still work? I do use non-LSM B-Trees in Pregelix. Best, Yingyi On Tue, Aug 11, 2015 at 12:51 PM, Young-Seok Kim (Code Review) < do-not-reply@asterix-gerrit.ics.uci.edu> wrote: > Young-Seok Kim has posted comments on this change. > > Change subject: Make LSM bulkload append-only and write-once. > ...................................................................... > > > Patch Set 17: > > (26 comments) > > I added some comments. > Will continue reviewing. > > > https://asterix-gerrit.ics.uci.edu/#/c/255/17/hyracks/hyracks-storage-am-bloomfilter/src/main/java/edu/uci/ics/hyracks/storage/am/bloomfilter/impls/BloomFilter.java > File > hyracks/hyracks-storage-am-bloomfilter/src/main/java/edu/uci/ics/hyracks/storage/am/bloomfilter/impls/BloomFilter.java: > > Line 230: page.acquireWriteLatch(); > Latch doesn't need to be acquired since bloomfilter is created by a single > either flush or merge thread and the filter is not shown to any other > threads until the operation is over. If we consider a situation where > concurrent reader or writer during the bloomfilter building, we may > introduce a flag which says whether it's built in exclusive manner or not. > > More importantly, in general, there should be very careful confiscated > pages cleanup operations whenever there are exception during bloomfilter > building operation. Otherwise, any exception during the building will make > those confiscated pages unavailable pages in the buffer cache. For example, > when the following initPage throws an exception in whatever reason, all > confiscated pages so far should be returned back to the buffercache. > > > Line 234: page.releaseWriteLatch(false); > Latch doesn't need to be acquired since bloomfilter is created by a single > either flush or merge thread and the filter is not shown to any other > threads until the operation is over. > > > Line 279: page.acquireWriteLatch(); > All these latch/unlatch things are unnecessary since confiscated pages > seem to be accessed exclusively anyway, right? > > > > https://asterix-gerrit.ics.uci.edu/#/c/255/17/hyracks/hyracks-storage-am-common/src/main/java/edu/uci/ics/hyracks/storage/am/common/api/IFreePageManagerFactory.java > File > hyracks/hyracks-storage-am-common/src/main/java/edu/uci/ics/hyracks/storage/am/common/api/IFreePageManagerFactory.java: > > Line 19: public interface IFreePageManagerFactory { > Should it be named as MetadataManagerFactory? and the method name should > be changed accordingly as well? > > > > https://asterix-gerrit.ics.uci.edu/#/c/255/17/hyracks/hyracks-storage-am-common/src/main/java/edu/uci/ics/hyracks/storage/am/common/api/IMetaDataManager.java > File > hyracks/hyracks-storage-am-common/src/main/java/edu/uci/ics/hyracks/storage/am/common/api/IMetaDataManager.java: > > Line 20: public interface IMetaDataManager { > It will be better to add general description of this interface such as the > general role/purpose of this interface. > > > > https://asterix-gerrit.ics.uci.edu/#/c/255/17/hyracks/hyracks-storage-am-common/src/main/java/edu/uci/ics/hyracks/storage/am/common/impls/AbstractTreeIndex.java > File > hyracks/hyracks-storage-am-common/src/main/java/edu/uci/ics/hyracks/storage/am/common/impls/AbstractTreeIndex.java: > > Line 104: int numPages = bufferCache.getNumPagesOfFile(fileId); > It's good to have description of what the file looks like when this > create() is called such as the first page is meta page whose page number is > -1 and so on. According to such description, what's the possible numbers > for this numPages? I'm not sure how numPages can be greater than 2 below. > > > Line 115: //initEmptyTree(); > Why is this commented out instead of removing the line? > > > Line 185: wasActivated = true; > Why there are two flags? Good to have the description of why two flags are > needed. In which situation, wasActivated is true and isActive is false? > > > Line 215: bufferCache.deleteFile(fileId, true); > why should the pages in deleted file flushed? > > > Line 282: return freePageManager; > Maybe better to change the variable name to meta(data)PageManager? > > > Line 366: bufferCache.returnPage(frontierPage); > what if returnpage throws exception? > > > Line 367: continue; > continue is unnecessary. > > > Line 413: > nodeFrontiers.get(i).page.releaseWriteLatch(false); > again, why latch should be acquired? > > > Line 424: frontier.page = bufferCache.confiscatePage(-1); > why -1? > > > > https://asterix-gerrit.ics.uci.edu/#/c/255/17/hyracks/hyracks-storage-common/src/main/java/edu/uci/ics/hyracks/storage/common/buffercache/AsyncFIFOPageQueueManager.java > File > hyracks/hyracks-storage-common/src/main/java/edu/uci/ics/hyracks/storage/common/buffercache/AsyncFIFOPageQueueManager.java: > > Line 100: protected LinkedBlockingQueue queue = new > LinkedBlockingQueue(); > LinkedBlockingQueue is not thread-safe. I don't see queue is accessed in a > thread-safe manner. > > > Line 101: volatile Thread writerThread; > why is writerThread declared as volatile? > > > Line 116: return new PageQueue(bufferCache, writer); > Why should PageQueue be created more than once? Since there is only one > writer, a global queue can be used for this purpose. If this is for > reducing competing among threads, then this PageQueue seems to be used in > exclusive manner by one thread. If this is the case, then > ConcurrentLinkedQueue and ConcurrentHashMap member variables in the > PageQueue don't have to be "Concurrent", right? > > > Line 124: queue.wait(100l); > Will it be better to use wait() and notify() pair instead of checking > queue repeatedly? What's the benefit of waiting with the time? > Who gives notification to the caller of this wait on queue object? > > > Line 154: lowWater.wait(100l); > Will it be better to use wait() and notify() pair instead of checking > queue repeatedly? What's the benefit of waiting with the time? > > > Line 167: while (!haltWriter) { > What's gonna happen in the following case? when there are pages in the > queue, haltWriter can be set to true by destroyQueue(). Then, how the rest > of the pages in the queue are written to disk? It seems that destroyQueue > should set the haltWriter to true when the queue was empty. Am I missing > something here? > > > Line 178: page.acquireReadLatch(); > why is readLatch acquired, not writeLatch? Also, no other threads will try > to read this page anyway. So, the latch is even needed? > > > > https://asterix-gerrit.ics.uci.edu/#/c/255/17/hyracks/hyracks-storage-common/src/main/java/edu/uci/ics/hyracks/storage/common/buffercache/BufferCache.java > File > hyracks/hyracks-storage-common/src/main/java/edu/uci/ics/hyracks/storage/common/buffercache/BufferCache.java: > > Line 612: if (curPage >= > pageReplacementStrategy.getNumPages()) { > pageReplacementStrategy.getNumPages() in master branch was modified by > Yingyi. So, this should be modified considering that modification. > Probably, this getNumPages() call can be outside of the sync block. > > > Line 940: if (victim != null) { > Is it possible to factor out this if clause if this logic is similar to > the what findPage() does? if not, what's the difference that couldn't allow > the refactoring? > > > Line 1016: ((CachedPage) returnPage).virtual.set(true); > what's the purpose of this set(true) call? > > > Line 1028: } > The following code also looks identical to what findPage does. > > > Line 1056: public void returnPage(ICachedPage page, boolean reinsert) { > We need to carefully think about the default value of reinsert flag for > merge operation. Flush might be ok, but considering the size of components > resulted from merge will be very big and the pages in the merge component > less likely to be accessed right after merge. Nonetheless, if those pages > go and compete to get the bucket lock, that might introduce unnecessary > contention against readers and flusher threads. It's good to see the effect > of changing the reinsert flag to false for merge operations with a > concurrent feed job and concurrent queries. > > > -- > To view, visit https://asterix-gerrit.ics.uci.edu/255 > To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings > > Gerrit-MessageType: comment > Gerrit-Change-Id: I80fb891b5310252143854a336b591bf3f8cd4ba7 > Gerrit-PatchSet: 17 > Gerrit-Project: hyracks > Gerrit-Branch: master > Gerrit-Owner: Ian Maxon > Gerrit-Reviewer: Ian Maxon > Gerrit-Reviewer: Jenkins > Gerrit-Reviewer: Murtadha Hubail > Gerrit-Reviewer: Young-Seok Kim > Gerrit-HasComments: Yes > --001a11378a40680236051d0eb917--