spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin BONNET (JIRA)" <>
Subject [jira] [Created] (SPARK-16996) Hive ACID delta files not seen
Date Wed, 10 Aug 2016 13:37:20 GMT
Benjamin BONNET created SPARK-16996:

             Summary: Hive ACID delta files not seen
                 Key: SPARK-16996
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.5.2
         Environment: Hive 1.2.1, Spark 1.5.2
            Reporter: Benjamin BONNET
            Priority: Critical

spark-sql seems not to see data stored as delta files in an ACID Hive table.

Actually I encountered the same problem as describe here :

For example, create an ACID table with HiveCLI and insert a row :

set hive.enforce.bucketing=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.compactor.initiator.on=true;
set hive.compactor.worker.threads=1;
 CREATE TABLE deltas(cle string,valeur string) CLUSTERED BY (cle) INTO 1 BUCKETS
    TBLPROPERTIES ('transactional'='true');

INSERT INTO deltas VALUES("a","a");
Then make a query with spark-sql CLI :
SELECT * FROM deltas;
That query gets no result and there are no errors in logs.
If you go to HDFS to inspect table files, you find only deltas
~>hdfs dfs -ls /apps/hive/warehouse/deltas
Found 1 items
drwxr-x---   - me hdfs          0 2016-08-10 14:03 /apps/hive/warehouse/deltas/delta_0020943_0020943
Then if you run compaction on that table (in HiveCLI) :
As a result, the delta will be compute into a base file :
~>hdfs dfs -ls /apps/hive/warehouse/deltas
Found 1 items
drwxrwxrwx   - me hdfs          0 2016-08-10 15:25 /apps/hive/warehouse/deltas/base_0020943
Go back to spark-sql and the same query gets a result :
SELECT * FROM deltas;
a       a
Time taken: 0.477 seconds, Fetched 1 row(s)
But next time you make an insert into Hive table : 
INSERT INTO deltas VALUES("b","b");
spark-sql will immediately see changes : 
SELECT * FROM deltas;
a       a
b       b
Time taken: 0.122 seconds, Fetched 2 row(s)
Yet there was no other compaction, but spark-sql "sees" the base AND the delta file :
~> hdfs dfs -ls /apps/hive/warehouse/deltas
Found 2 items
drwxrwxrwx   - valdata hdfs          0 2016-08-10 15:25 /apps/hive/warehouse/deltas/base_0020943
drwxr-x---   - valdata hdfs          0 2016-08-10 15:31 /apps/hive/warehouse/deltas/delta_0020956_0020956

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message