Mailing-List: contact dev-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hive.apache.org
Date: Wed, 13 Mar 2013 19:58:14 +0000 (UTC)
From: "Jitendra Nath Pandey (JIRA)" <jira@apache.org>
To: hive-dev@hadoop.apache.org
Message-ID: <JIRA.12636846.1363204593772.435921.1363204694805@arcas>
In-Reply-To: <JIRA.12636846.1363204593772@arcas>
References: <JIRA.12636846.1363204593772@arcas>
Subject: [jira] [Commented] (HIVE-4160) Vectorized Query Execution in Hive
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HIVE-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13601558#comment-13601558 ] 

Jitendra Nath Pandey commented on HIVE-4160:
--------------------------------------------

    This will be an incremental work in multiple phases with no regression on current system. We will publish a design/scope document very soon.
    The main idea behind the proposal is to transform the execution engine to process a row batch at a time instead of a single row. The row batch will consist of column vectors and each operator will process the whole column vector at a time. The column vector will consist of array(s) of primitive types as far as possible.
    The expressions will be implemented for various data types using pre-compiled templates. The appropriate expressions will be added to the operators based on data types.
    A vectorized iterator interface will be implemented by the file formats to provide vectorized input to the operator tree. 

                
> Vectorized Query Execution in Hive
> ----------------------------------
>
>                 Key: HIVE-4160
>                 URL: https://issues.apache.org/jira/browse/HIVE-4160
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Jitendra Nath Pandey
>            Assignee: Jitendra Nath Pandey
>
>   Hive query execution engine currently processes one row at a time. A single row of data goes through all the operators before next row can be processed. This mode of processing is very inefficient in terms of CPU usage. Research has demonstrated that this yields very low instructions per cycle [MonetDB]. Also currently hive heavily relies on lazy deserialization and data columns go through a layer of object inspectors that identify column type, de-serialize data and determine appropriate expression routines in the inner loop. These layers of virtual method calls further slow down the processing.
> Reference: http://www-db.cs.wisc.edu/cidr/cidr2005/papers/P19.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira