Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 769C6200D3E for ; Thu, 16 Nov 2017 07:33:05 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 7522E160BE6; Thu, 16 Nov 2017 06:33:05 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id BB696160BE5 for ; Thu, 16 Nov 2017 07:33:04 +0100 (CET) Received: (qmail 74474 invoked by uid 500); 16 Nov 2017 06:33:03 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 74465 invoked by uid 99); 16 Nov 2017 06:33:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Nov 2017 06:33:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id D7646C303D for ; Thu, 16 Nov 2017 06:33:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id sfRTwnH_3tzb for ; Thu, 16 Nov 2017 06:33:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 65EFC5FC6A for ; Thu, 16 Nov 2017 06:33:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 920C3E0DF0 for ; Thu, 16 Nov 2017 06:33:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 242DF240CE for ; Thu, 16 Nov 2017 06:33:00 +0000 (UTC) Date: Thu, 16 Nov 2017 06:33:00 +0000 (UTC) From: =?utf-8?Q?Maciej_Bry=C5=84ski_=28JIRA=29?= To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (SPARK-16996) Hive ACID delta files not seen MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 16 Nov 2017 06:33:05 -0000 [ https://issues.apache.org/jira/browse/SPARK-16996?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D162= 54821#comment-16254821 ]=20 Maciej Bry=C5=84ski commented on SPARK-16996: ---------------------------------------- After research I think Spark 2.2 in HDP is using older Hive library than ot= her parts of stack. I'll try with vanilla Spark. > Hive ACID delta files not seen > ------------------------------ > > Key: SPARK-16996 > URL: https://issues.apache.org/jira/browse/SPARK-16996 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.5.2, 1.6.3, 2.1.2, 2.2.0 > Environment: Hive 1.2.1, Spark 1.5.2 > Reporter: Benjamin BONNET > Priority: Critical > > spark-sql seems not to see data stored as delta files in an ACID Hive tab= le. > Actually I encountered the same problem as describe here : http://stackov= erflow.com/questions/35955666/spark-sql-is-not-returning-records-for-hive-t= ransactional-tables-on-hdp > For example, create an ACID table with HiveCLI and insert a row : > {code} > set hive.support.concurrency=3Dtrue; > set hive.enforce.bucketing=3Dtrue; > set hive.exec.dynamic.partition.mode=3Dnonstrict; > set hive.txn.manager=3Dorg.apache.hadoop.hive.ql.lockmgr.DbTxnManager; > set hive.compactor.initiator.on=3Dtrue; > set hive.compactor.worker.threads=3D1; > CREATE TABLE deltas(cle string,valeur string) CLUSTERED BY (cle) INTO 1 = BUCKETS > ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' > STORED AS=20 > INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' > TBLPROPERTIES ('transactional'=3D'true'); > INSERT INTO deltas VALUES("a","a"); > {code} > Then make a query with spark-sql CLI : > {code} > SELECT * FROM deltas; > {code} > That query gets no result and there are no errors in logs. > If you go to HDFS to inspect table files, you find only deltas > {code} > ~>hdfs dfs -ls /apps/hive/warehouse/deltas > Found 1 items > drwxr-x--- - me hdfs 0 2016-08-10 14:03 /apps/hive/warehouse/d= eltas/delta_0020943_0020943 > {code} > Then if you run compaction on that table (in HiveCLI) : > {code} > ALTER TABLE deltas COMPACT 'MAJOR'; > {code} > As a result, the delta will be compute into a base file : > {code} > ~>hdfs dfs -ls /apps/hive/warehouse/deltas > Found 1 items > drwxrwxrwx - me hdfs 0 2016-08-10 15:25 /apps/hive/warehouse/d= eltas/base_0020943 > {code} > Go back to spark-sql and the same query gets a result : > {code} > SELECT * FROM deltas; > a a > Time taken: 0.477 seconds, Fetched 1 row(s) > {code} > But next time you make an insert into Hive table :=20 > {code} > INSERT INTO deltas VALUES("b","b"); > {code} > spark-sql will immediately see changes :=20 > {code} > SELECT * FROM deltas; > a a > b b > Time taken: 0.122 seconds, Fetched 2 row(s) > {code} > Yet there was no other compaction, but spark-sql "sees" the base AND the = delta file : > {code} > ~> hdfs dfs -ls /apps/hive/warehouse/deltas > Found 2 items > drwxrwxrwx - valdata hdfs 0 2016-08-10 15:25 /apps/hive/wareho= use/deltas/base_0020943 > drwxr-x--- - valdata hdfs 0 2016-08-10 15:31 /apps/hive/wareho= use/deltas/delta_0020956_0020956 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org