Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 16F0CC7B0 for ; Wed, 13 Mar 2013 19:58:17 +0000 (UTC) Received: (qmail 93004 invoked by uid 500); 13 Mar 2013 19:58:15 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 92930 invoked by uid 500); 13 Mar 2013 19:58:14 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 92892 invoked by uid 500); 13 Mar 2013 19:58:14 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 92874 invoked by uid 99); 13 Mar 2013 19:58:14 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Mar 2013 19:58:14 +0000 Date: Wed, 13 Mar 2013 19:58:14 +0000 (UTC) From: "Jitendra Nath Pandey (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-4160) Vectorized Query Execution in Hive MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13601558#comment-13601558 ] Jitendra Nath Pandey commented on HIVE-4160: -------------------------------------------- This will be an incremental work in multiple phases with no regression on current system. We will publish a design/scope document very soon. The main idea behind the proposal is to transform the execution engine to process a row batch at a time instead of a single row. The row batch will consist of column vectors and each operator will process the whole column vector at a time. The column vector will consist of array(s) of primitive types as far as possible. The expressions will be implemented for various data types using pre-compiled templates. The appropriate expressions will be added to the operators based on data types. A vectorized iterator interface will be implemented by the file formats to provide vectorized input to the operator tree. > Vectorized Query Execution in Hive > ---------------------------------- > > Key: HIVE-4160 > URL: https://issues.apache.org/jira/browse/HIVE-4160 > Project: Hive > Issue Type: New Feature > Reporter: Jitendra Nath Pandey > Assignee: Jitendra Nath Pandey > > Hive query execution engine currently processes one row at a time. A single row of data goes through all the operators before next row can be processed. This mode of processing is very inefficient in terms of CPU usage. Research has demonstrated that this yields very low instructions per cycle [MonetDB]. Also currently hive heavily relies on lazy deserialization and data columns go through a layer of object inspectors that identify column type, de-serialize data and determine appropriate expression routines in the inner loop. These layers of virtual method calls further slow down the processing. > Reference: http://www-db.cs.wisc.edu/cidr/cidr2005/papers/P19.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira