impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Hecht (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-3202,IMPALA-2079: rework scratch file I/O
Date Sat, 17 Dec 2016 00:49:37 GMT
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3202,IMPALA-2079: rework scratch file I/O

Patch Set 16:


Some more comments. Will need to get back to:
some of
File be/src/runtime/disk-io-mgr.h:

Line 606:     /// Change the file and offset of this write range. Data and callbacks are unchanged.
presumably this is only legal before AddWriteRange() is called?

PS16, Line 610: File data can be over-written by calling SetData() and AddWriteRange().
I think that's what this was trying to say. would be good to clarify that as well.
File be/src/runtime/

Line 252:   return disk_id % file_group_->io_mgr_->num_local_disks();
is this right? can't the scratch dir be on NFS or Isilon, for example. Anyway, you're probably
just bringing this code over from somewhere else, so don't have to address it now but maybe

Line 261:   if (bytes_allocated_ > 0) FileSystemUtil::RemovePaths(vector<string>(1,
is the bytes_allocated_ guard still correct given that AllocateSpace() no longer does the
file creation? actually, where does CreateFile() happen now?

Line 340:   return Status::OK();
how about just inlining this code into CreateFiles(); this extra abstrction seems to be a
net loss in readability.

Line 417:       bind(mem_fn(&FileGroup::WriteComplete), this, tmp_handle.get(), _1);
i think we've started favoring using lambdas for this, but i don't feel strongly about it.
File be/src/runtime/tmp-file-mgr.h:

Line 57: /// the file range backing it can be recycled for a different WriteHandle. The file
> That is TODO for the next patch - I didn't want to tackle recycling of vari
Even without variable-length blocks, doesn't the consumer of this class need to know that
block_size is the maximum write length? is that explained somewhere?

Line 69: /// temporarily blacklist devices that show I/O errors.
> boost::function is really a function pointer plus a data payload, so whoeve
Okay, I see in BufferedBlockMgr.  I'm fine with either way.
File be/src/runtime/tmp-file-mgr.h:

Line 44: /// files are allocated and recovery from certain I/O errors. I/O is done via DiskIoMgr.
it's worth also mentioning that this handles encryption/compression

PS16, Line 79: boost
doesn't c++11 provide this now? any reason to not start using it rather than boost?

PS16, Line 101: scratch
temporary? or say scratch in the first sentence.

PS16, Line 101: .
of this file group.

PS16, Line 103: 'buffer' must be at least 1 byte long
why do we have this requirement?

PS16, Line 107: ,
or is canceled?

PS16, Line 111: .
what if it's canceled? does the write callback still happen?

Line 125:     /// decompressing as necessary.
so you either call this or CloseWrite(), correct? i.e. once you call it, the handle is closed
(I guess the unique_ptr is specifying that). it's a little confusing since it is inconsistent
with our usual cancel & close lifecycles.

PS16, Line 148: query instance

PS16, Line 155: File** new_file = nullptr
it doesn't look like we use the new_file parameter, so let's delete it.
also, how about we rename this "AddNewFile()" since that's also an important side effect --
the file is added to the filegroup by this method. And helps differentiate with TmpFileMgr::NewFile()
which only creates the File.
see also comments in the definition.

PS16, Line 155: unique_id
is this different than the FileGroup's unique_id_? if so, since there is only one file per
device (i.e. directory), why do we need it?

PS16, Line 162: range
range is kinda overloaded (i.e. usually means scan- or write-range), so maybe say "temporary
file range".

PS16, Line 165: calls

PS16, Line 166: marks the write complete
what does that mean?

Line 171:     /// and
line break glitch

Line 226:     /// Can be read without lock after AllocateSpace() succeeds at least once.

PS16, Line 243: scratch
do we always mean the same thing when we say "scratch" and "temporary" or is this a subtle
difference? it'd be nice to have a single terminology, or at least stick to something like
"temporary file" and "scratch space" (which is within temporary files).

Line 250:   /// is either in-flight or has completed and the data is in the file.
couldn't it also have failed?

PS16, Line 268: asynchrously

Line 272:     std::string DebugString();
nit: linebreak

PS16, Line 322: while the write is in-flight
after the write completes (either successfully or not), what are the rules for the fields
below? and at what point exactly is a write no longer "in-flight" w.r.t. this locking?

PS16, Line 326: block

Line 328:     RuntimeState* state_;
why does this need to be protected by the lock?

PS16, Line 367:  query instance
will that still be still true?

PS16, Line 373: unique_id
what is this? is it different than file_group->unique_id_?
File common/thrift/ImpalaInternalService.thrift:

Line 67:   2: optional i32 max_errors = 100
is this meant to be part of this change?

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I8c9c587df006d2f09d72dd636adafbd295fcdc17
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <>
Gerrit-Reviewer: Dan Hecht <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message