Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 4572A200D24 for ; Tue, 24 Oct 2017 22:12:05 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 43FF31609C8; Tue, 24 Oct 2017 20:12:05 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 91851160BDB for ; Tue, 24 Oct 2017 22:12:04 +0200 (CEST) Received: (qmail 70970 invoked by uid 500); 24 Oct 2017 20:12:03 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 70957 invoked by uid 99); 24 Oct 2017 20:12:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Oct 2017 20:12:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id F3C761A10A9 for ; Tue, 24 Oct 2017 20:12:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id hAq8sHrON8jb for ; Tue, 24 Oct 2017 20:12:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 708EB5F239 for ; Tue, 24 Oct 2017 20:12:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id F1C82E0555 for ; Tue, 24 Oct 2017 20:12:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id A4F20212F5 for ; Tue, 24 Oct 2017 20:12:00 +0000 (UTC) Date: Tue, 24 Oct 2017 20:12:00 +0000 (UTC) From: "Matt McCline (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (HIVE-17433) Vectorization: Support Decimal64 in Hive Query Engine MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 24 Oct 2017 20:12:05 -0000 [ https://issues.apache.org/jira/browse/HIVE-17433?page=3Dcom.atlassian= .jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1621= 7539#comment-16217539 ]=20 Matt McCline edited comment on HIVE-17433 at 10/24/17 8:11 PM: --------------------------------------------------------------- Known Wrong Vectorization Results on Master: HIVE-17893: Vectorization: Wrong results for vector_udf3.q HIVE-17892: Vectorization: Wrong results for vectorized_timestamp_funcs.q HIVE-17890: Vectorization: Wrong results for vectorized_case.q HIVE-17889: Vectorization: Wrong results for vectorization_15.q HIVE-17863: Vectorization: Two Q files produce wrong PTF query results HIVE-17123: Vectorization: Wrong results for vector_groupby_cube1.q HIVE-16919: Vectorization: vectorization_short_regress.q has query result d= ifferences with non-vectorized run. Vectorized unary function broken? HIVE-17895: Vectorization: Wrong results for schema_evol_text_vec_table.q (= LLAP) HIVE-17894: Vectorization: Wrong results for dynpart_sort_opt_vectorization= .q (LLAP) was (Author: mmccline): Known Wrong Vectorization Results on Master: HIVE-17893: Vectorization: Wrong results for vector_udf3.q HIVE-17892: Vectorization: Wrong results for vectorized_timestamp_funcs.q HIVE-17890: Vectorization: Wrong results for vectorized_case.q HIVE-17889: Vectorization: Wrong results for vectorization_15.q HIVE-17863: Vectorization: Two Q files produce wrong PTF query results HIVE-17123: Vectorization: Wrong results for vector_groupby_cube1.q > Vectorization: Support Decimal64 in Hive Query Engine > ----------------------------------------------------- > > Key: HIVE-17433 > URL: https://issues.apache.org/jira/browse/HIVE-17433 > Project: Hive > Issue Type: Bug > Components: Hive > Reporter: Matt McCline > Assignee: Matt McCline > Priority: Critical > Attachments: HIVE-17433.03.patch, HIVE-17433.04.patch, HIVE-17433= .05.patch > > > Provide partial support for Decimal64 within Hive. By partial I mean tha= t our current decimal has a large surface area of features (rounding, multi= ply, divide, remainder, power, big precision, and many more) but only a sma= ll number has been identified as being performance hotspots. > Those are small precision decimals with precision <=3D 18 that fit within= a 64-bit long we are calling Decimal64 =E2=80=8B. Just as we optimize row= -mode execution engine hotspots by selectively adding new vectorization cod= e, we can treat the current decimal as the full featured one and add additi= onal Decimal64 optimization where query benchmarks really show it help. > This change creates a Decimal64ColumnVector. > This change currently detects small decimal with Hive for Vectorized text= input format and uses some new Decimal64 vectorized classes for comparison= , addition, and later perhaps a few GroupBy aggregations like sum, avg, min= , max. > The patch also supports a new annotation that can mark a VectorizedInputF= ormat as supporting Decimal64 (it is called DECIMAL_64). So, in separate w= ork those other formats such as ORC, PARQUET, etc can be done in later JIRA= s so they participate in the Decimal64 performance optimization. > The idea is when you annotate your input format with: > @VectorizedInputFormatSupports(supports =3D {DECIMAL_64}) > the Vectorizer in Hive will plan usage of Decimal64ColumnVector instead o= f DecimalColumnVector. Upon an input format seeing Decimal64ColumnVector b= eing used, the input format can fill that column vector with decimal64 long= s instead of HiveDecimalWritable objects of DecimalColumnVector. > There will be a Hive environment variable hive.vectorized.input.format.su= pports.enabled that has a string list of supported features. The default w= ill start as "decimal_64". It can be turned off to allow for performance c= omparisons and testing. > The query SELECT * FROM DECIMAL_6_1_txt where key - 100BD < 200BD ORDER B= Y key, value > Will have a vectorized explain plan looking like: > ... > Filter Operator > Filter Vectorization: > className: VectorFilterOperator > native: true > predicateExpression: FilterDecimal64ColLessDecimal64Sca= lar(col 2, val 20000000)(children: Decimal64ColSubtractDecimal64Scalar(col = 0, val 10000000, outputDecimal64AbsMax 99999999999) -> 2:decimal(11,5)/DECI= MAL_64) -> boolean > predicate: ((key - 100) < 200) (type: boolean) > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029)