atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suma Shivaprasad (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ATLAS-626) Hive temporary table metadata is captured in atlas.
Date Mon, 18 Apr 2016 21:31:25 GMT

    [ https://issues.apache.org/jira/browse/ATLAS-626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246592#comment-15246592
] 

Suma Shivaprasad commented on ATLAS-626:
----------------------------------------

Can be applied only after ATLAS-583 since the patch is based on this.

> Hive temporary table metadata is captured in atlas.
> ---------------------------------------------------
>
>                 Key: ATLAS-626
>                 URL: https://issues.apache.org/jira/browse/ATLAS-626
>             Project: Atlas
>          Issue Type: Bug
>    Affects Versions: 0.7-incubating
>            Reporter: Ayub Khan
>            Assignee: Suma Shivaprasad
>             Fix For: 0.7-incubating
>
>         Attachments: ATLAS-626.patch
>
>
> As part of HIVE-7090, hive supports session level temporary tables and life cycle of
them.
> These temporary tables are used to run some additional queries against it and cleaned
up at the end of the session.
> Inserting data in to table creates this temporary table, whose metadata is synced to
atlas and once the session expires, the table is cleaned up in hive but the table still exists
in atlas.
> 1. What is the use-case of storing metadata of this temporary table? Are temporary tables
important?
> 2. Impact: Metadata objects might grow if the the probability of insert operation is
high in production.
> Lineage snapshot link: https://monosnap.com/file/DtnPZA85Ug0Q27arTOqY1FnhbDQdXU
> {noformat}
> 0: jdbc:hive2://localhost:10000/default> show tables;
> +-----------+--+
> | tab_name  |
> +-----------+--+
> | abc12312  |
> | h3        |
> | h5        |
> +-----------+--+
> 3 rows selected (0.231 seconds)
> 0: jdbc:hive2://localhost:10000/default> insert into table default.h5 values ( "efg1",
"abc1", 1231, 123121);
> INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
> INFO  : number of splits:1
> INFO  : Submitting tokens for job: job_local737234864_0006
> INFO  : The url to track the job: http://localhost:8080/
> INFO  : Job running in-process (local Hadoop)
> INFO  : 2016-04-04 15:21:20,381 Stage-1 map = 100%,  reduce = 0%
> INFO  : Ended Job = job_local737234864_0006
> INFO  : Stage-4 is selected by condition resolver.
> INFO  : Stage-3 is filtered out by condition resolver.
> INFO  : Stage-5 is filtered out by condition resolver.
> INFO  : Moving data to: hdfs://localhost:9000/user/hive/warehouse/h5/.hive-staging_hive_2016-04-04_15-21-17_057_941878903735098303-9/-ext-10000
from hdfs://localhost:9000/user/hive/warehouse/h5/.hive-staging_hive_2016-04-04_15-21-17_057_941878903735098303-9/-ext-10002
> INFO  : Loading data to table default.h5 from hdfs://localhost:9000/user/hive/warehouse/h5/.hive-staging_hive_2016-04-04_15-21-17_057_941878903735098303-9/-ext-10000
> INFO  : Table default.h5 stats: [numFiles=5, numRows=5, totalSize=106, rawDataSize=101]
> No rows affected (3.72 seconds)
> 0: jdbc:hive2://localhost:10000/default> show tables;
> +------------------------+--+
> |        tab_name        |
> +------------------------+--+
> | abc12312               |
> | h3                     |
> | h5                     |
> | values__tmp__table__1  |
> +------------------------+--+
> 4 rows selected (0.196 seconds)
> 0: jdbc:hive2://localhost:10000/default> describe extended values__tmp__table__1;
> +-----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+--+
> |          col_name           |                                                     
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                      data_type
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                 | comment  |
> +-----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+--+
> | tmp_values_col1             | string                                              
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                       |          |
> | tmp_values_col2             | string                                              
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                       |          |
> | tmp_values_col3             | string                                              
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                       |          |
> | tmp_values_col4             | string                                              
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                       |          |
> |                             | NULL                                                
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                                                       | NULL     |
> | Detailed Table Information  | Table(tableName:values__tmp__table__1, dbName:default,
owner:apathan, createTime:1459763477, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:tmp_values_col1,
type:string, comment:), FieldSchema(name:tmp_values_col2, type:string, comment:), FieldSchema(name:tmp_values_col3,
type:string, comment:), FieldSchema(name:tmp_values_col4, type:string, comment:)], location:hdfs://localhost:9000/tmp/hive/apathan/d461c5c3-931a-4aa3-9124-997b75f10c11/_tmp_space.db/Values__Tmp__Table__1,
inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[],
skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[],
parameters:{}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, privileges:PrincipalPrivilegeSet(userPrivileges:{},
groupPrivileges:null, rolePrivileges:null), temporary:true)  |          |
> +-----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+--+
> 6 rows selected (0.174 seconds)
> 0: jdbc:hive2://localhost:10000/default>
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message