asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Young-Seok Kim (Code Review)" <>
Subject Change in hyracks[master]: Make LSM bulkload append-only and write-once.
Date Tue, 11 Aug 2015 19:51:06 GMT
Young-Seok Kim has posted comments on this change.

Change subject: Make LSM bulkload append-only and write-once.

Patch Set 17:


I added some comments. 
Will continue reviewing.
File hyracks/hyracks-storage-am-bloomfilter/src/main/java/edu/uci/ics/hyracks/storage/am/bloomfilter/impls/

Line 230:                 page.acquireWriteLatch();
Latch doesn't need to be acquired since bloomfilter is created by a single either flush or
merge thread and the filter is not shown to any other threads until the operation is over.
If we consider a situation where concurrent reader or writer during the bloomfilter building,
we may introduce a flag which says whether it's built in exclusive manner or not.

More importantly, in general, there should be very careful confiscated pages cleanup operations
whenever there are exception during bloomfilter building operation. Otherwise, any exception
during the building will make those confiscated pages unavailable pages in the buffer cache.
For example, when the following initPage throws an exception in whatever reason, all confiscated
pages so far should be returned back to the buffercache.

Line 234:                     page.releaseWriteLatch(false);
Latch doesn't need to be acquired since bloomfilter is created by a single either flush or
merge thread and the filter is not shown to any other threads until the operation is over.

Line 279:                 page.acquireWriteLatch();
All these latch/unlatch things are unnecessary since confiscated pages seem to be accessed
exclusively anyway, right?
File hyracks/hyracks-storage-am-common/src/main/java/edu/uci/ics/hyracks/storage/am/common/api/

Line 19: public interface IFreePageManagerFactory {
Should it be named as MetadataManagerFactory? and the method name should be changed accordingly
as well?
File hyracks/hyracks-storage-am-common/src/main/java/edu/uci/ics/hyracks/storage/am/common/api/

Line 20: public interface IMetaDataManager {
It will be better to add general description of this interface such as the general role/purpose
of this interface.
File hyracks/hyracks-storage-am-common/src/main/java/edu/uci/ics/hyracks/storage/am/common/impls/

Line 104:             int numPages = bufferCache.getNumPagesOfFile(fileId);
It's good to have description of what the file looks like when this create() is called such
as the first page is meta page whose page number is -1 and so on. According to such description,
what's the possible numbers for this numPages? I'm not sure how numPages can be greater than
2 below.

Line 115:             //initEmptyTree();
Why is this commented out instead of removing the line?

Line 185:         wasActivated = true;
Why there are two flags? Good to have the description of why two flags are needed. In which
situation, wasActivated is true and isActive is false?

Line 215:         bufferCache.deleteFile(fileId, true);
why should the pages in deleted file flushed?

Line 282:         return freePageManager;
Maybe better to change the variable name to meta(data)PageManager?

Line 366:                     bufferCache.returnPage(frontierPage);
what if returnpage throws exception?

Line 367:                     continue;
continue is unnecessary.

Line 413:                             nodeFrontiers.get(i).page.releaseWriteLatch(false);
again, why latch should be acquired?

Line 424:    = bufferCache.confiscatePage(-1);
why -1?
File hyracks/hyracks-storage-common/src/main/java/edu/uci/ics/hyracks/storage/common/buffercache/

Line 100:     protected LinkedBlockingQueue<QueueEntry> queue = new LinkedBlockingQueue<QueueEntry>();
LinkedBlockingQueue is not thread-safe. I don't see queue is accessed in a thread-safe manner.

Line 101:     volatile Thread writerThread;
why is writerThread declared as volatile?

Line 116:         return new PageQueue(bufferCache, writer);
Why should PageQueue be created more than once? Since there is only one writer, a global queue
can be used for this purpose. If this is for reducing competing among threads, then this PageQueue
seems to be used in exclusive manner by one thread. If this is the case, then ConcurrentLinkedQueue
and ConcurrentHashMap member variables in the PageQueue don't have to be "Concurrent", right?

Line 124:                         queue.wait(100l);
Will it be better to use wait() and notify() pair instead of checking queue repeatedly? What's
the benefit of waiting with the time?
Who gives notification to the caller of this wait on queue object?

Line 154:                     lowWater.wait(100l);
Will it be better to use wait() and notify() pair instead of checking queue repeatedly? What's
the benefit of waiting with the time?

Line 167:         while (!haltWriter) {
What's gonna happen in the following case? when there are pages in the queue, haltWriter can
be set to true by destroyQueue(). Then, how the rest of the pages in the queue are written
to disk? It seems that destroyQueue should set the haltWriter to true when the queue was empty.
Am I missing something here?

Line 178:                 page.acquireReadLatch();
why is readLatch acquired, not writeLatch? Also, no other threads will try to read this page
anyway. So, the latch is even needed?
File hyracks/hyracks-storage-common/src/main/java/edu/uci/ics/hyracks/storage/common/buffercache/

Line 612:                             if (curPage >= pageReplacementStrategy.getNumPages())
pageReplacementStrategy.getNumPages() in master branch was modified by Yingyi. So, this should
be modified considering that modification. Probably, this getNumPages() call can be outside
of the sync block.

Line 940:             if (victim != null) {
Is it possible to factor out this if clause if this logic is similar to the what findPage()
does? if not, what's the difference that couldn't allow the refactoring?

Line 1016:                 ((CachedPage) returnPage).virtual.set(true);
what's the purpose of this set(true) call?

Line 1028:             }
The following code also looks identical to what findPage does.

Line 1056:     public void returnPage(ICachedPage page, boolean reinsert) {
We need to carefully think about the default value of reinsert flag for merge operation. Flush
might be ok, but considering the size of components resulted from merge will be very big and
the pages in the merge component less likely to be accessed right after merge. Nonetheless,
if those pages go and compete to get the bucket lock, that might introduce unnecessary contention
against readers and flusher threads. It's good to see the effect of changing the reinsert
flag to false for merge operations  with a concurrent  feed job and concurrent queries.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I80fb891b5310252143854a336b591bf3f8cd4ba7
Gerrit-PatchSet: 17
Gerrit-Project: hyracks
Gerrit-Branch: master
Gerrit-Owner: Ian Maxon <>
Gerrit-Reviewer: Ian Maxon <>
Gerrit-Reviewer: Jenkins <>
Gerrit-Reviewer: Murtadha Hubail <>
Gerrit-Reviewer: Young-Seok Kim <>
Gerrit-HasComments: Yes

View raw message