Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 273721957E for ; Thu, 17 Mar 2016 23:47:34 +0000 (UTC) Received: (qmail 17187 invoked by uid 500); 17 Mar 2016 23:47:33 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 17113 invoked by uid 500); 17 Mar 2016 23:47:33 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 17101 invoked by uid 99); 17 Mar 2016 23:47:33 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Mar 2016 23:47:33 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 70E182C1F58 for ; Thu, 17 Mar 2016 23:47:33 +0000 (UTC) Date: Thu, 17 Mar 2016 23:47:33 +0000 (UTC) From: "Matt McCline (JIRA)" To: dev@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HIVE-13306) Better Decimal vectorization MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Matt McCline created HIVE-13306: ----------------------------------- Summary: Better Decimal vectorization Key: HIVE-13306 URL: https://issues.apache.org/jira/browse/HIVE-13306 Project: Hive Issue Type: Bug Components: Hive Reporter: Matt McCline Priority: Critical Decimal Vectorization Requirements =E2=80=A2=09Today, the LongColumnVector, DoubleColumnVector, BytesColumnVec= tor, TimestampColumnVector classes store the data as primitive Java data ty= pes long, double, or byte arrays for efficiency. =E2=80=A2=09DecimalColumnVector is different - it has an array of Object re= ferences to HiveDecimal objects. =E2=80=A2=09The HiveDecimal object uses an internal object BigDecimal for i= ts implementation. Further, BigDecimal itself uses an internal object BigI= nteger for its implementation, and BigInteger uses an int array. 4 objects= total. =E2=80=A2=09And, HiveDecimal is an immutable object which means arithmetic = and other operations produce new HiveDecimal object with 3 new objects unde= rneath. =E2=80=A2=09A major reason Vectorization is fast is the ColumnVector classe= s except DecimalColumnVector do not have to allocate additional memory per = row. This avoids memory fragmentation and pressure on the Java Garbage Co= llector that DecimalColumnVector can generate. It is very significant. =E2=80=A2=09What can be done with DecimalColumnVector to make it much more = efficient? o=09Design several new decimal classes that allow the caller to manage the = decimal storage. o=09If it takes N int values to store a decimal (e.g. N=3D1..5), then a new= DecimalColumnVector would have an int[] of length N*1024 (where 1024 is th= e default column vector size). o=09Why store a decimal in separate int values? =E2=80=A2=09Java does not support 128 bit integers. =E2=80=A2=09Java does not support unsigned integers. =E2=80=A2=09In order to do multiplication of a decimal represented in a lon= g you need twice the storage (i.e. 128 bits). So you need to represent par= ts in 32 bit integers. =E2=80=A2=09But really since we do not have unsigned, really you can only d= o multiplications on N-1 bits or 31 bits. =E2=80=A2=09So, 5 ints are needed for decimal storage... of 38 digits. o=09It makes sense to have just one algorithm for decimals rather than one = for HiveDecimal and another for DecimalColumnVector. So, make HiveDecimal = store N int values, too. o=09A lower level primitive decimal class would accept decimals stored as i= nt arrays and produces results into int arrays. It would be used by HiveDe= cimal and DecimalColumnVector. -- This message was sent by Atlassian JIRA (v6.3.4#6332)