Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 99A56200C4B for ; Mon, 20 Mar 2017 18:57:48 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 986AB160B76; Mon, 20 Mar 2017 17:57:48 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E4784160B8F for ; Mon, 20 Mar 2017 18:57:47 +0100 (CET) Received: (qmail 59940 invoked by uid 500); 20 Mar 2017 17:57:47 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 59930 invoked by uid 99); 20 Mar 2017 17:57:47 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Mar 2017 17:57:47 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 81E2CC14FF for ; Mon, 20 Mar 2017 17:57:46 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.651 X-Spam-Level: X-Spam-Status: No, score=0.651 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_NEUTRAL=0.652] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id eKOCDJGgs1zT for ; Mon, 20 Mar 2017 17:57:45 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 7AC4E60D38 for ; Mon, 20 Mar 2017 17:57:44 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id E6EA8E08B9 for ; Mon, 20 Mar 2017 17:57:42 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 19293254D5 for ; Mon, 20 Mar 2017 17:57:42 +0000 (UTC) Date: Mon, 20 Mar 2017 17:57:42 +0000 (UTC) From: "Naveen Gangam (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-16257) Intermittent issue with incorrect resultset with Spark MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 20 Mar 2017 17:57:48 -0000 [ https://issues.apache.org/jira/browse/HIVE-16257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933187#comment-15933187 ] Naveen Gangam commented on HIVE-16257: -------------------------------------- [~xuefuz] [~szehon] Any clues on where this could be originating? When the problem does occur, the incorrect column value always seems to match a value from another row like show above. Ruled out any beeline display issue with output because it is reproducible from CLI too. Although this is not reproducible with spark-shell, I have not ruled out to be a spark issue because the set of transformations used by spark-shell could be different from the transformations used by Hive. What code should we instrument to confirm or eliminate hive as a source of the problem? Any help appreciated. Thank you > Intermittent issue with incorrect resultset with Spark > ------------------------------------------------------ > > Key: HIVE-16257 > URL: https://issues.apache.org/jira/browse/HIVE-16257 > Project: Hive > Issue Type: Bug > Components: Hive > Affects Versions: 1.1.0 > Reporter: Naveen Gangam > > This issue is highly intermittent that only seems to occurs with spark engine when the query has a GROUPBY clause. The following is the testcase. > {code} > drop table if exists test_hos_sample; > create table test_hos_sample (name string, val1 decimal(18,2), val2 decimal(20,3)); > insert into test_hos_sample values ('test1',101.12,102.123),('test1',101.12,102.123),('test2',102.12,103.234),('test1',101.12,102.123),('test3',103.52,102.345),('test3',103.52,102.345),('test3',103.52,102.345),('test3',103.52,102.345),('test3',103.52,102.345),('test4',104.52,104.456),('test4',104.52,104.456),('test5',105.52,105.567),('test3',103.52,102.345),('test5',105.52,105.567); > set hive.execution.engine=spark; > select name, val1,val2 from test_hos_sample group by name, val1, val2; > {code} > Expected Results: > {code} > name val1 val2 > test5 105.52 105.567 > test3 103.52 102.345 > test1 101.12 102.123 > test4 104.52 104.456 > test2 102.12 103.234 > {code} > Incorrect results once in a while: > {code} > name val1 val2 > test5 105.52 105.567 > test3 103.52 102.345 > test1 104.52 102.123 > test4 104.52 104.456 > test2 102.12 103.234 > {code} > 1) Not reproducible with HoMR. > 2) Not an issue when running from spark-shell. > 3) Not reproducible when the column data type is String or double. Only reproducible with decimal data types. Also works fine for decimal datatype if you cast decimal as string on read and cast it back to decimal on select. > 4) Occurs with parquet and text file format as well. (havent tried with other formats). > 5) Occurs in both scenarios when table data is within encryption zone and outside. > 6) Even in clusters where this is reproducible, this occurs once in like 20 times or more. > 7) Occurs with both Beeline and Hive CLI. > 8) Reproducible only when there is a a groupby clause. -- This message was sent by Atlassian JIRA (v6.3.15#6346)