impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Skye Wanderman-Milne (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA-2548: Codegen Tuple::MaterializeExprs() and use in TopN node
Date Tue, 29 Mar 2016 02:09:01 GMT
Skye Wanderman-Milne has posted comments on this change.

Change subject: IMPALA-2548: Codegen Tuple::MaterializeExprs() and use in TopN node
......................................................................


Patch Set 10:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/1901/10/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:

Line 395:   static const char* LLVM_CLASS_NAME;
> does everybody need access to this? if not, make private and add friends
Can you explain the underlying principle here? This is used by codegen functions to get the
TupleDescriptor LLVM type (and similarly we have a LLVM_CLASS_NAME for every class that needs
to be referenced in an IRBuilder codegen'd function). According to https://google.github.io/styleguide/cppguide.html#Friends,
"Friends extend, but do not break, the encapsulation boundary of a class". My understanding
is that LLVM_CLASS_NAME users don't qualify since they're not implementing TupleDescriptor
functionality, but I'm not super familiar with friend class best practices.


http://gerrit.cloudera.org:8080/#/c/1901/10/be/src/runtime/tuple.cc
File be/src/runtime/tuple.cc:

Line 208:     SlotDescriptor* slot_desc = desc.slots()[i];
> instead of generating ir by hand below, could you do this by 
I implemented a quick prototype of the interpreted functionality so you can see what it looks
like before I implement the rest and clean up what I don't need anymore. Here's how it works:

1. The loop body is broken out into a function for each type. We need this for cross-compiling
so we can use the specific ExprContext::Get*Val() methods instead of the generic ExprContext::GetValue()
method. This is done with a macro. So far I've only plumbed through int and string, but this
would be extended to all types. This also requires having a RawValue::Write() function for
each AnyVal subclass.

2. MaterializeExprs() has a switch statement in the for loop that calls the correct body function.

3. (NYI) I'm not sure how to write a generic codegen helper function to manually call the
body function with the correct arguments, but maybe you can help me think of a way. I'm currently
planning to write a new CodegenMaterializeExprs() function that loops over all the slot descriptors,
converts the relevant slot parameters to constant LLVM values, and generates a call to the
correct loop body function passing in the constant params. We'll have to generate constant
ColumnType and Tuple::NullIndicatorOffset structs to pass in, but luckily I already have a
patch to do ColumnType and NullIndicator should be similar. This function will also generate
the initial memset call, which is the only non-loop-body work, so we don't have to cross-compile
MaterializeExprs(). This function will also replace the Get*Val calls with the codegen'd versions
of these functions.

What do you think? Even though this makes the C++ code more verbose, I think it's an improvement
since we'll end up cross-compiling the AnyVal::Write() calls instead of reimplementing them
using the IRBuilder in CodegenAnyVal::WriteToSlot(). However, I'm not sure this will make
CodegenMaterializeExprs() any simpler, and it may actually get a little longer and more nuanced.

Feel free to grab me if you wanna discuss in person.


http://gerrit.cloudera.org:8080/#/c/1901/10/be/src/runtime/tuple.h
File be/src/runtime/tuple.h:

Line 123:   template <bool collect_string_vals, bool no_pool>
> what is the point of templatizing this? if there's a codegen'd version of i
It's to differentiate the pool vs. the no pool function calls for when we replace the function
calls during codegen. Having the template param gives us two different symbols for the two
different cases. I'll expand the comment to explain this more.


Line 143:   static const char* MATERIALIZE_EXPRS_SYMBOL;
> move to private (and make whoever needs this a friend, hopefully that'll be
See my other comment, I think this case is similar.


-- 
To view, visit http://gerrit.cloudera.org:8080/1901
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib422a8d50303c21c6a228675157bf867e8619444
Gerrit-PatchSet: 10
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Skye Wanderman-Milne <skye@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Skye Wanderman-Milne <skye@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message