Mailing-List: contact dev-help@asterixdb.incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@asterixdb.incubator.apache.org
MIME-Version: 1.0
In-Reply-To: <20150811195106.5A80A240F4A@unhygienix.ics.uci.edu>
References: 
 <gerrit.1430782107564.I80fb891b5310252143854a336b591bf3f8cd4ba7@asterix-gerrit.ics.uci.edu>
	<20150811195106.5A80A240F4A@unhygienix.ics.uci.edu>
Date: Tue, 11 Aug 2015 13:12:40 -0700
Message-ID: 
 <CALjEsF4nrYwVB6GB8yWnJd5gT9brPSr7xbaxhfyX9RfW37sCrQ@mail.gmail.com>
Subject: Re: Change in hyracks[master]: Make LSM bulkload append-only and
 write-once.
From: Yingyi Bu <buyingyi@gmail.com>
To: dev@asterixdb.incubator.apache.org, Young-Seok Kim <kisskys@gmail.com>
Cc: Ian Maxon <imaxon@apache.org>,
 Jenkins <jenkins@fulliautomatix.ics.uci.edu>,
	Murtadha Hubail <hubailmor@gmail.com>
Content-Type: multipart/alternative; boundary=001a11378a40680236051d0eb917

--001a11378a40680236051d0eb917
Content-Type: text/plain; charset=UTF-8

I just came across this change.

With this change, does the old B-Tree (non-LSM) still work?
I do use non-LSM B-Trees in Pregelix.

Best,
Yingyi


On Tue, Aug 11, 2015 at 12:51 PM, Young-Seok Kim (Code Review) <
do-not-reply@asterix-gerrit.ics.uci.edu> wrote:

> Young-Seok Kim has posted comments on this change.
>
> Change subject: Make LSM bulkload append-only and write-once.
> ......................................................................
>
>
> Patch Set 17:
>
> (26 comments)
>
> I added some comments.
> Will continue reviewing.
>
>
> https://asterix-gerrit.ics.uci.edu/#/c/255/17/hyracks/hyracks-storage-am-bloomfilter/src/main/java/edu/uci/ics/hyracks/storage/am/bloomfilter/impls/BloomFilter.java
> File
> hyracks/hyracks-storage-am-bloomfilter/src/main/java/edu/uci/ics/hyracks/storage/am/bloomfilter/impls/BloomFilter.java:
>
> Line 230:                 page.acquireWriteLatch();
> Latch doesn't need to be acquired since bloomfilter is created by a single
> either flush or merge thread and the filter is not shown to any other
> threads until the operation is over. If we consider a situation where
> concurrent reader or writer during the bloomfilter building, we may
> introduce a flag which says whether it's built in exclusive manner or not.
>
> More importantly, in general, there should be very careful confiscated
> pages cleanup operations whenever there are exception during bloomfilter
> building operation. Otherwise, any exception during the building will make
> those confiscated pages unavailable pages in the buffer cache. For example,
> when the following initPage throws an exception in whatever reason, all
> confiscated pages so far should be returned back to the buffercache.
>
>
> Line 234:                     page.releaseWriteLatch(false);
> Latch doesn't need to be acquired since bloomfilter is created by a single
> either flush or merge thread and the filter is not shown to any other
> threads until the operation is over.
>
>
> Line 279:                 page.acquireWriteLatch();
> All these latch/unlatch things are unnecessary since confiscated pages
> seem to be accessed exclusively anyway, right?
>
>
>
> https://asterix-gerrit.ics.uci.edu/#/c/255/17/hyracks/hyracks-storage-am-common/src/main/java/edu/uci/ics/hyracks/storage/am/common/api/IFreePageManagerFactory.java
> File
> hyracks/hyracks-storage-am-common/src/main/java/edu/uci/ics/hyracks/storage/am/common/api/IFreePageManagerFactory.java:
>
> Line 19: public interface IFreePageManagerFactory {
> Should it be named as MetadataManagerFactory? and the method name should
> be changed accordingly as well?
>
>
>
> https://asterix-gerrit.ics.uci.edu/#/c/255/17/hyracks/hyracks-storage-am-common/src/main/java/edu/uci/ics/hyracks/storage/am/common/api/IMetaDataManager.java
> File
> hyracks/hyracks-storage-am-common/src/main/java/edu/uci/ics/hyracks/storage/am/common/api/IMetaDataManager.java:
>
> Line 20: public interface IMetaDataManager {
> It will be better to add general description of this interface such as the
> general role/purpose of this interface.
>
>
>
> https://asterix-gerrit.ics.uci.edu/#/c/255/17/hyracks/hyracks-storage-am-common/src/main/java/edu/uci/ics/hyracks/storage/am/common/impls/AbstractTreeIndex.java
> File
> hyracks/hyracks-storage-am-common/src/main/java/edu/uci/ics/hyracks/storage/am/common/impls/AbstractTreeIndex.java:
>
> Line 104:             int numPages = bufferCache.getNumPagesOfFile(fileId);
> It's good to have description of what the file looks like when this
> create() is called such as the first page is meta page whose page number is
> -1 and so on. According to such description, what's the possible numbers
> for this numPages? I'm not sure how numPages can be greater than 2 below.
>
>
> Line 115:             //initEmptyTree();
> Why is this commented out instead of removing the line?
>
>
> Line 185:         wasActivated = true;
> Why there are two flags? Good to have the description of why two flags are
> needed. In which situation, wasActivated is true and isActive is false?
>
>
> Line 215:         bufferCache.deleteFile(fileId, true);
> why should the pages in deleted file flushed?
>
>
> Line 282:         return freePageManager;
> Maybe better to change the variable name to meta(data)PageManager?
>
>
> Line 366:                     bufferCache.returnPage(frontierPage);
> what if returnpage throws exception?
>
>
> Line 367:                     continue;
> continue is unnecessary.
>
>
> Line 413:
>  nodeFrontiers.get(i).page.releaseWriteLatch(false);
> again, why latch should be acquired?
>
>
> Line 424:             frontier.page = bufferCache.confiscatePage(-1);
> why -1?
>
>
>
> https://asterix-gerrit.ics.uci.edu/#/c/255/17/hyracks/hyracks-storage-common/src/main/java/edu/uci/ics/hyracks/storage/common/buffercache/AsyncFIFOPageQueueManager.java
> File
> hyracks/hyracks-storage-common/src/main/java/edu/uci/ics/hyracks/storage/common/buffercache/AsyncFIFOPageQueueManager.java:
>
> Line 100:     protected LinkedBlockingQueue<QueueEntry> queue = new
> LinkedBlockingQueue<QueueEntry>();
> LinkedBlockingQueue is not thread-safe. I don't see queue is accessed in a
> thread-safe manner.
>
>
> Line 101:     volatile Thread writerThread;
> why is writerThread declared as volatile?
>
>
> Line 116:         return new PageQueue(bufferCache, writer);
> Why should PageQueue be created more than once? Since there is only one
> writer, a global queue can be used for this purpose. If this is for
> reducing competing among threads, then this PageQueue seems to be used in
> exclusive manner by one thread. If this is the case, then
> ConcurrentLinkedQueue and ConcurrentHashMap member variables in the
> PageQueue don't have to be "Concurrent", right?
>
>
> Line 124:                         queue.wait(100l);
> Will it be better to use wait() and notify() pair instead of checking
> queue repeatedly? What's the benefit of waiting with the time?
> Who gives notification to the caller of this wait on queue object?
>
>
> Line 154:                     lowWater.wait(100l);
> Will it be better to use wait() and notify() pair instead of checking
> queue repeatedly? What's the benefit of waiting with the time?
>
>
> Line 167:         while (!haltWriter) {
> What's gonna happen in the following case? when there are pages in the
> queue, haltWriter can be set to true by destroyQueue(). Then, how the rest
> of the pages in the queue are written to disk? It seems that destroyQueue
> should set the haltWriter to true when the queue was empty. Am I missing
> something here?
>
>
> Line 178:                 page.acquireReadLatch();
> why is readLatch acquired, not writeLatch? Also, no other threads will try
> to read this page anyway. So, the latch is even needed?
>
>
>
> https://asterix-gerrit.ics.uci.edu/#/c/255/17/hyracks/hyracks-storage-common/src/main/java/edu/uci/ics/hyracks/storage/common/buffercache/BufferCache.java
> File
> hyracks/hyracks-storage-common/src/main/java/edu/uci/ics/hyracks/storage/common/buffercache/BufferCache.java:
>
> Line 612:                             if (curPage >=
> pageReplacementStrategy.getNumPages()) {
> pageReplacementStrategy.getNumPages() in master branch was modified by
> Yingyi. So, this should be modified considering that modification.
> Probably, this getNumPages() call can be outside of the sync block.
>
>
> Line 940:             if (victim != null) {
> Is it possible to factor out this if clause if this logic is similar to
> the what findPage() does? if not, what's the difference that couldn't allow
> the refactoring?
>
>
> Line 1016:                 ((CachedPage) returnPage).virtual.set(true);
> what's the purpose of this set(true) call?
>
>
> Line 1028:             }
> The following code also looks identical to what findPage does.
>
>
> Line 1056:     public void returnPage(ICachedPage page, boolean reinsert) {
> We need to carefully think about the default value of reinsert flag for
> merge operation. Flush might be ok, but considering the size of components
> resulted from merge will be very big and the pages in the merge component
> less likely to be accessed right after merge. Nonetheless, if those pages
> go and compete to get the bucket lock, that might introduce unnecessary
> contention against readers and flusher threads. It's good to see the effect
> of changing the reinsert flag to false for merge operations  with a
> concurrent  feed job and concurrent queries.
>
>
> --
> To view, visit https://asterix-gerrit.ics.uci.edu/255
> To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings
>
> Gerrit-MessageType: comment
> Gerrit-Change-Id: I80fb891b5310252143854a336b591bf3f8cd4ba7
> Gerrit-PatchSet: 17
> Gerrit-Project: hyracks
> Gerrit-Branch: master
> Gerrit-Owner: Ian Maxon <imaxon@apache.org>
> Gerrit-Reviewer: Ian Maxon <imaxon@apache.org>
> Gerrit-Reviewer: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
> Gerrit-Reviewer: Murtadha Hubail <hubailmor@gmail.com>
> Gerrit-Reviewer: Young-Seok Kim <kisskys@gmail.com>
> Gerrit-HasComments: Yes
>

--001a11378a40680236051d0eb917--