Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 19A2810BC1 for ; Sun, 1 Dec 2013 20:18:38 +0000 (UTC) Received: (qmail 80003 invoked by uid 500); 1 Dec 2013 20:18:36 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 79860 invoked by uid 500); 1 Dec 2013 20:18:36 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 79758 invoked by uid 500); 1 Dec 2013 20:18:36 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 79754 invoked by uid 99); 1 Dec 2013 20:18:36 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 01 Dec 2013 20:18:36 +0000 Date: Sun, 1 Dec 2013 20:18:35 +0000 (UTC) From: "Teddy Choi (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-5761) Implement vectorized support for the DATE data type MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-5761?page=3Dcom.atlassian.= jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D13836= 108#comment-13836108 ]=20 Teddy Choi commented on HIVE-5761: ---------------------------------- I wrote a draft version. {quote} DATE shall be implemented within a LongColumnVector. HIVE-4055 represents a= DATE value by a number of days since epoch. A vectorized DATE representati= on will contain this number and its optional cached parse result. A read op= eration result and a complex date function result, such as date_add and dat= e_sub, will have an empty cache. During the first simple date function, suc= h as year, month and day, it will cache its parse result. Then following si= mple functions will reuse its cache to avoid repeated parses. Its effect on= performance will be small, since java.util.Date calculates all fields at o= nce and caches their results. The first 32-bit set will represent a number = of days since epoch as a signed integer. Its range is about from BC 2^31/36= 5-1970 to AD 2^31/365+1970. A comparison between vectorized DATE values sho= uld consider only their first sets. The following 32-bit set will represent= its cached parse result; cached state (1 bit; 0 for not cached, 1 for cach= ed), era (1 bit; 0 for AD, 1 for BC), year (unsigned 21-bit integer), month= (unsigned 4-bit integer) and day of month (unsigned 5-bit integer). A valu= e without a cache will have only zero bits after its first set. A parsed ye= ar, month and day of month value will start from 1 to represent the exact n= umber. Its range is from BC 2^21 to AD 2^21, which is shorter than the firs= t set. If a date is not in the range, its cached state will remain false (0= ). The value 0xFFFFFFFF00000000L shall be reserved for future use to indica= te data outside the standard range. {quote} > Implement vectorized support for the DATE data type > --------------------------------------------------- > > Key: HIVE-5761 > URL: https://issues.apache.org/jira/browse/HIVE-5761 > Project: Hive > Issue Type: Sub-task > Reporter: Eric Hanson > Assignee: Teddy Choi > > Add support to allow queries referencing DATE columns and expression resu= lts to run efficiently in vectorized mode. This should re-use the code for = the the integer/timestamp types to the extent possible and beneficial. Incl= ude unit tests and end-to-end tests. Consider re-using or extending existin= g end-to-end tests for vectorized integer and/or timestamp operations. -- This message was sent by Atlassian JIRA (v6.1#6144)