hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Hanson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-5397) VectorizedRowBatch member variables are public.
Date Sat, 02 Nov 2013 00:12:18 GMT

    [ https://issues.apache.org/jira/browse/HIVE-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811796#comment-13811796
] 

Eric Hanson commented on HIVE-5397:
-----------------------------------

Hi Brock,

I'm in favor of encapsulation for most code. But this is different because this is a low-level
performance enhancement project that has some research behind it. The theory behind the vectorized
query execution technique that we use was published in this paper:

Peter Boncz et al., MonetDB/X100: Hyper-Pipelining Query Execution, Proceedings of the CIDR
Conference, 2005. http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=C26BD72358252F6A301DA1FF6E37D44B?doi=10.1.1.324.9516&rep=rep1&type=pdf

Please see the performance numbers in the paper.

State of the art query execution systems like the one in Microsoft SQL Server, Vectorwise,
Vertica, and ParAccel/Redshift (not in any particular order), all use this strategy or something
like it. It's well known in the industry that this is a place where being architecture-conscious
pays big dividends. That requires some violation of encapsulation. 

It is possible that the compiler might do some function inlining for us in the inner loop
of some of the vector "for" loops, but that is too much of a risk for us in most cases to
rely on the compiler here for the most primitive operations like arithmetic and comparisons.
Arguably, using put/get methods to access columns rather than array access like we use in
our VectorExpression subclasses probably would not lose much perfomance. But we already decided
to use array access to get columns, and it is used in hundreds of places in the code. I think
it is a reasonable choice and not necessary to change it.

-Eric




> VectorizedRowBatch member variables are public.
> -----------------------------------------------
>
>                 Key: HIVE-5397
>                 URL: https://issues.apache.org/jira/browse/HIVE-5397
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Jitendra Nath Pandey
>            Assignee: Jitendra Nath Pandey
>
> VectorizedRowBatch exposes members as public to avoid method call overheads. Alternative
is to rely on JIT to inline the methods. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message