impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3931: arbitrary fixed-size uda intermediate types
Date Wed, 16 Aug 2017 00:11:41 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3931: arbitrary fixed-size uda intermediate types
......................................................................


Patch Set 12:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/7526/12/be/src/codegen/codegen-anyval.cc
File be/src/codegen/codegen-anyval.cc:

PS12, Line 555: the StringVal is already set up
> I'm not sure I follow this comment given that this function is setting the 
Done


http://gerrit.cloudera.org:8080/#/c/7526/12/be/src/codegen/codegen-anyval.h
File be/src/codegen/codegen-anyval.h:

PS12, Line 52: TYPE_FIXED_UDA_INTERMEDIATE
> why isn't this a byte array if that's how it is represented? I guess we wan
It could be a *Val which has a ptr member only (various places assume that the intermediate
value is a subclass of AnyVal). StringVal is convenient and consistent with how CHAR(n) is
passed around.

I thought about changing that but it wasn't clear to me that it significantly improved the
code so I thought I'd start with the smaller delta. It does require changing the interface
of a lot of the aggregate functions and updating the corresponding symbols everywhere to match.


PS12, Line 193: matching
              :   /// native type
> isn't the native type for intermediate just char array?  By "native" do we 
Yes it is, that would be another example of a native type along with TimestampValue or StringValue.

I agree that the "Native" terminology seems to be specific to this class. Elsewhere it's "raw
value" or "slot". Happy to rename to whatever seems clearer.


PS12, Line 195: raw_val_ptr
> is this a pointer to the slot itself? if so, why not just specify it like t
I thought about this more and I don't think it even makes sense to call StoreToNativePtr()
with FIXED_UDA_INTERMEDIATE, since the caller has to understand the memory management and
should know that the buffer was already written to.

I think this is a reason to separate FIXED_UDA_INTERMEDIATE from CHAR - the memory management
scheme seems somewhat specific to this use case and it's not clear that the requirements will
always be the same as CHAR.


http://gerrit.cloudera.org:8080/#/c/7526/12/be/src/codegen/llvm-codegen.h
File be/src/codegen/llvm-codegen.h:

PS12, Line 240: the
> delete
Done


http://gerrit.cloudera.org:8080/#/c/7526/12/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 888
> it might be a bit confusing that now StringVal's can have different managem
Yeah I think it's potentially confusing. At a minimum we'd probably want to revisit before
exposing it to UDAs.

If I was to do it I'd probably do it as a follow-on patch since this already touches many
lines of code.


http://gerrit.cloudera.org:8080/#/c/7526/12/be/src/udf/udf.h
File be/src/udf/udf.h:

PS12, Line 81: TYPE_FIXED_BUFFER
> should we rename that to TYPE_CHAR, or would that break compat?
It seems better to keep them separate. They're not the same data type conceptually. The semantics
might be the same for the intermediate value case, but I find it easier to reason about if
we separate the types then share the implementation in cases where the requirements are the
same.


PS12, Line 346: For UDAs that need a complex data structure as the intermediate state, the
              : /// intermediate type should be string and the UDA can cast the ptr to the
structure
              : /// it is using.
              : ///
              : /// Memory Management: For allocations that are not returned to Impala, the
UDA should use
              : /// the FunctionContext::Allocate()/Free() methods. In general, Allocate()
is called in
              : /// Init(), and then Free() must be called in both Serialize() and Finalize(),
since
              : /// either of these functions may be called to clean up the state. For StringVal
              : /// allocations returned to Impala (e.g. returned by UdaSerialize()), the
UDA should
              : /// allocate the result via StringVal(FunctionContext*, int) ctor or the function
              : /// StringVal::CopyFrom(FunctionContext*, const uint8_t*, size_t) and Impala
will
              : /// automatically handle freeing it.
> does any of that need updating to account for the fact that these new Strin
It seems confusing to document something that actual UDAs can't use. The cleanest solution
I can think of is to add a new intermediate type in udf-internal.h and document the semantics
there.


http://gerrit.cloudera.org:8080/#/c/7526/12/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java:

Line 57:   private static final int RANK_INTERMEDIATE_SIZE = 16;
> are these constants verified for consistency in any way?
We have DCHECKs in the backend for dst->len, so we'd find out as soon as we ran the function.


-- 
To view, visit http://gerrit.cloudera.org:8080/7526
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ife90cf27989f98ffb5ef5c39f1e09ce92e8cb87c
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message