hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang (JIRA)" <>
Subject [jira] [Commented] (HIVE-5356) Move arithmatic UDFs to generic UDF implementations
Date Mon, 09 Dec 2013 20:30:08 GMT


Xuefu Zhang commented on HIVE-5356:

Before this patch was committed, integer-integer division vectorized. Now it does not. This
is a performance regression and also a functional regression for "EXPLAIN". This may have
been caught by the vectorization tests (see test output in comment above on about 3 Nov),
but maybe it was not clear to the developers of this patch because vectorization is pretty
new. If a vectorization test .q.out file contains in EXPLAIN output the string "Vectorized
execution: true" then the plan vectorizes. It is important that future patches not regress
this behavior for performance reasons. I would like to see any regressions to vectorization
be fixed before patches are applied, ideally, or else have some discussion and consensus.

While this patch may prevents vectorization for int/int, I don't think we should emphasize
the idea of implementation over functionality, as this occurred over and over again. I also
disagree about the label of "functional regression" for obvious reasons. Rather, I think functionality
prevails over implementation. A feature with wrong functionality is as bad as, if not worse
than, a bad performance. Having said this, I still support vectorization, but I would use
this to kill anything that might impact vectorization.

> Move arithmatic UDFs to generic UDF implementations
> ---------------------------------------------------
>                 Key: HIVE-5356
>                 URL:
>             Project: Hive
>          Issue Type: Task
>          Components: UDF
>    Affects Versions: 0.11.0
>            Reporter: Xuefu Zhang
>            Assignee: Xuefu Zhang
>             Fix For: 0.13.0
>         Attachments: HIVE-5356.1.patch, HIVE-5356.10.patch, HIVE-5356.11.patch, HIVE-5356.12.patch,
HIVE-5356.2.patch, HIVE-5356.3.patch, HIVE-5356.4.patch, HIVE-5356.5.patch, HIVE-5356.6.patch,
HIVE-5356.7.patch, HIVE-5356.8.patch, HIVE-5356.9.patch
> Currently, all of the arithmetic operators, such as add/sub/mult/div, are implemented
as old-style UDFs and java reflection is used to determine the return type TypeInfos/ObjectInspectors,
based on the return type of the evaluate() method chosen for the expression. This works fine
for types that don't have type params.
> Hive decimal type participates in these operations just like int or double. Different
from double or int, however, decimal has precision and scale, which cannot be determined by
just looking at the return type (decimal) of the UDF evaluate() method, even though the operands
have certain precision/scale. With the default of "decimal" without precision/scale, then
(10, 0) will be the type params. This is certainly not desirable.
> To solve this problem, all of the arithmetic operators would need to be implemented as
GenericUDFs, which allow returning ObjectInspector during the initialize() method. The object
inspectors returned can carry type params, from which the "exact" return type can be determined.
> It's worth mentioning that, for user UDF implemented in non-generic way, if the return
type of the chosen evaluate() method is decimal, the return type actually has (10,0) as precision/scale,
which might not be desirable. This needs to be documented.
> This JIRA will cover minus, plus, divide, multiply, mod, and pmod, to limit the scope
of review. The remaining ones will be covered under HIVE-5706.

This message was sent by Atlassian JIRA

View raw message