carbondata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ravindra Pesala (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (CARBONDATA-1345) outdated tablemeta cache cause operation failed in multiple session
Date Mon, 18 Sep 2017 08:46:00 GMT

     [ https://issues.apache.org/jira/browse/CARBONDATA-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ravindra Pesala resolved CARBONDATA-1345.
-----------------------------------------
    Resolution: Fixed

> outdated tablemeta cache cause operation failed in multiple session
> -------------------------------------------------------------------
>
>                 Key: CARBONDATA-1345
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-1345
>             Project: CarbonData
>          Issue Type: Bug
>            Reporter: xuchuanyin
>            Assignee: xuchuanyin
>            Priority: Minor
>             Fix For: 1.2.0
>
>          Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> # Scenario
> ## Steps to reproduce
> Start 2 spark-beeline as two different sessions, do the following steps in corresponding
session:
> (SESSION1)
> 1. create table T_Carbn01(Active_status String,Item_type_cd INT,Qty_day_avg INT,Qty_total
INT,Sell_price BIGINT,Sell_pricep DOUBLE,Discount_price DOUBLE,Profit DECIMAL(3,2),Item_code
String,Item_name String,Outlet_name String,Update_time TIMESTAMP,Create_date String)STORED
BY 'org.apache.carbondata.format' TBLPROPERTIES('table_blocksize'='128');
> 2. LOAD DATA INPATH 'hdfs://hacluster/user/Ram/T_Hive1.csv' INTO table T_Carbn01 options
('DELIMITER'=',', 'QUOTECHAR'='\','BAD_RECORDS_LOGGER_ENABLE'='true', 'BAD_RECORDS_ACTION'='REDIRECT',
'FILEHEADER'='Active_status,Item_type_cd,Qty_day_avg,Qty_total,Sell_price,Sell_pricep,Discount_price,Profit,Item_code,Item_name,Outlet_name,Update_time,Create_date');
> (SESSION2):
> 1. update t_carbn01 set(Active_status) = ('TRUE') where Item_type_cd = 41;
> (SESSION1):
> 1. Drop table t_carbn01;
> 2. create table T_Carbn01(Active_status String,Item_type_cd INT,Qty_day_avg INT,Qty_total
INT,Sell_price BIGINT,Sell_pricep DOUBLE,Discount_price DOUBLE,Profit DECIMAL(3,2),Item_code
String,Item_name String,Outlet_name String,Update_time TIMESTAMP,Create_date String)STORED
BY 'org.apache.carbondata.format' TBLPROPERTIES('table_blocksize'='128');
> 3. LOAD DATA INPATH 'hdfs://hacluster/user/Ram/T_Hive1.csv' INTO table T_Carbn01 options
('DELIMITER'=',', 'QUOTECHAR'='\','BAD_RECORDS_LOGGER_ENABLE'='true', 'BAD_RECORDS_ACTION'='REDIRECT',
'FILEHEADER'='Active_status,Item_type_cd,Qty_day_avg,Qty_total,Sell_price,Sell_pricep,Discount_price,Profit,Item_code,Item_name,Outlet_name,Update_time,Create_date');
> (SESSION2):
> 1. update t_carbn01 set(Active_status) = ('TRUE') where Item_type_cd = 41;
> ## Outputs
> message are as below:
> ```
> Error: java.lang.RuntimeException: Update operation failed. Job aborted due to stage
failure: Task 0 in stage 14.0 failed 4 times, most recent failure: Lost task 0.3 in stage
14.0 (TID 29, master, executor 2): java.io.IOException: java.io.IOException: Dictionary file
does not exist: hdfs://user/hive/warehouse/carbon.store/default/t_carbn01/Metadata/ddfb3bc8-2fea-41fe-a4ff-18588df41aec.dictmeta
>     at org.apache.carbondata.core.cache.dictionary.ForwardDictionaryCache.getAll(ForwardDictionaryCache.java:146)
>     at org.apache.spark.sql.DictionaryLoader.loadDictionary(CarbonDictionaryDecoder.scala:686)
>     at org.apache.spark.sql.DictionaryLoader.getDictionary(CarbonDictionaryDecoder.scala:703)
>     at org.apache.spark.sql.ForwardDictionaryWrapper.getDictionaryValueForKeyInBytes(CarbonDictionaryDecoder.scala:654)
>     at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
Source)
>     at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>     at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:378)
>     at org.apache.spark.sql.execution.columnar.InMemoryRelation$$anonfun$1$$anon$1.hasNext(InMemoryRelation.scala:132)
>     at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:215)
>     at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1041)
>     at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1032)
>     at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:972)
>     at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1032)
>     at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:715)
>     at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
>     at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> ```
> ## Input data
> sample for input dataļ¼š
> ```
> TRUE,2,423,3046340,200000000003454300, 121.5,4.99,2.44,SE3423ee,asfdsffdfg,EtryTRWT,2012-01-12
03:14:05.123456729,2012-01-20
> TRUE,3,453,3003445,200000000000003450, 121.5,4.99,2.44,SE3423ee,asfdsffdfg,ERTEerWT,2012-01-13
03:24:05.123456739,2012-01-20
> TRUE,4,4350,3044364,200000000000000000, 121.5,4.99,2.44,SE3423ee,asfdsffdfg,ERTtryWT,2012-01-14
23:03:05.123456749,2012-01-20
> TRUE,114,4520,30000430,200000000004300000, 121.5,4.99,2.44,RE3423ee,asfdsffdfg,4RTETRWT,2012-01-01
23:02:05.123456819,2012-01-20
> FALSE,123,454,30000040,200000000000000000, 121.5,4.99,2.44,RE3423ee,asfrewerfg,6RTETRWT,2012-01-02
23:04:05.123456829,2012-01-20
> TRUE,11,4530,3000040,200000000000000000, 121.5,4.99,2.44,SE3423ee,asfdsffder,TRTETRWT,2012-01-03
05:04:05.123456839,2012-01-20
> TRUE,14,4590,3000400,200000000000000000, 121.5,4.99,2.44,ASD423ee,asfertfdfg,HRTETRWT,2012-01-04
05:06:05.123456849,2012-01-20
> FALSE,41,4250,00000,200000000000000000, 121.5,4.99,2.44,SAD423ee,asrtsffdfg,HRTETRWT,2012-01-05
05:07:05.123456859,2012-01-20
> TRUE,13,4510,30400,200000000000000000, 121.5,4.99,2.44,DE3423ee,asfrtffdfg,YHTETRWT,2012-01-06
06:08:05.123456869,2012-01-20
> ```
> # Analyze
> In the error message, it says the dictmeta doesnot exist.
> Actually this file is generated during the first load operation in SESSION1,And the tablemeta
is cached in SESSION2 when doing update operation in SESSION2. After DELETE-LOAD operation
in SESSION1, old dictionary files has been deleted and new dictionary files are generated
in SESSION1. But in SESSION2, when doing update operation, we still use the outdated tablemeta
from cache which refers to the dictmeta that were outdated, thus causing the error.
> To solve this problem, we should refresh the cache for tableMeta when the corresponding
data schema has been updated.
> # Solution
> Refresh the tablemeta cache when table schema has been changed.
> Since HiveSessionState.lookupRelation is slow(especially in concurrent query scenario),
dont call this method when table schema has not been changed.
> # Notes
> I've tested the scenario in my environment and it is OK.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message