impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Hecht (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-3202,IMPALA-2079: rework scratch file I/O
Date Mon, 19 Dec 2016 23:13:21 GMT
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3202,IMPALA-2079: rework scratch file I/O

Patch Set 16:


Flushing out some more comments since you've updated the patchset. Will continue on the newer
File be/src/runtime/

Line 651:       status = RepinInMemBlock(block);
why do we need to do this since the block wasn't in memory in this case?
actually, what case is this? why don't we have to read the block back from disk in this case?

Line 669:     if (!status.ok()) goto error;
if the read failed, why don't we have to still CloseWrite()?

Line 691:     if (block->valid_data_len() == 0) {
when is the 0 len case used?

Line 697:   }
why do we need this block since we've already called WaitForWrite()?

Line 751:   // Workaround to avoid adding a special code path for the 0-byte edge case.
what is that special case? can we not issue writes of zero length to diskiomgr? will bufferpool
need this special case?
File be/src/runtime/buffered-block-mgr.h:

PS16, Line 403: evicted from
              :   /// memory
I'm confused by the name "InMemBlock" given this says it handles blocks that aren't in memory.
File be/src/runtime/

Line 49
also not sure if you meant to include this?
File be/src/runtime/tmp-file-mgr-internal.h:

Line 28: /// can be allocated and files removed using AllocateSpace() and Remove().
let's clarify that File should only used internally TmpFileMgr

PS16, Line 41: offset
.. to the location of the newly allocated file space?

PS16, Line 55:  it is a remote FS, disk IDs are chosen round-robin.
does this preserve existing behavior for temporary files? for data files, we have a special
queue for these.
and is the round robin disk IDs only over the local disk queues or does it also include the
remote disk queues?

PS16, Line 65: created
when i first read the past tense, I was confused as to how we set this "const" to the value
we chose when creating the directory.

Line 83:   const int disk_id_;
do we need to keep DeviceId around, or could we just use disk_ids?
File be/src/runtime/

PS16, Line 150: const DeviceId&
sometimes we pass as const&, other times by value. Is there a reason for the inconsistency?

PS16, Line 209: active_tmp_devices

PS16, Line 394: See logs for previous "
              :       "errors that may have caused this."
given that we now accumulate errors, why say this?

PS16, Line 431: could block other
              :   // threads.
that's always true. is this trying to say that the Read() is synchronous and so other threads
could become blocked on the I/O?
File be/src/runtime/tmp-file-mgr.h:

Line 76:   /// It is used as a handle for external classes to identify devices.
do external classes still care about DeviceId's, now that File's aren't exposed?

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I8c9c587df006d2f09d72dd636adafbd295fcdc17
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <>
Gerrit-Reviewer: Dan Hecht <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message