hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sahil Takiar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified
Date Thu, 23 Mar 2017 23:48:41 GMT

    [ https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15939454#comment-15939454
] 

Sahil Takiar commented on HIVE-15396:
-------------------------------------

[~pxiong] can't we take the location, create a {{FileSystem}} object, and the run {{fs.exists()}}
- if the location exists, then don't setup stats, if it doesn't exist then setup full stats.

There is no guarantee that other process don't write data into the the location, but then
again there is no guarantee that other processes don't write into {{hive.metastore.warehouse.dir}}

> Basic Stats are not collected when for managed tables with LOCATION specified
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-15396
>                 URL: https://issues.apache.org/jira/browse/HIVE-15396
>             Project: Hive
>          Issue Type: Bug
>          Components: Statistics
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>         Attachments: HIVE-15396.1.patch
>
>
> Basic stats are not collected when a managed table is created with a specified {{LOCATION}}
clause.
> {code}
> 0: jdbc:hive2://localhost:10000> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:10000> describe formatted hdfs_1;
> +-------------------------------+----------------------------------------------------+-----------------------------+
> |           col_name            |                     data_type                     
|           comment           |
> +-------------------------------+----------------------------------------------------+-----------------------------+
> | # col_name                    | data_type                                         
| comment                     |
> |                               | NULL                                              
| NULL                        |
> | col                           | int                                               
|                             |
> |                               | NULL                                              
| NULL                        |
> | # Detailed Table Information  | NULL                                              
| NULL                        |
> | Database:                     | default                                           
| NULL                        |
> | Owner:                        | anonymous                                         
| NULL                        |
> | CreateTime:                   | Wed Mar 22 18:09:19 PDT 2017                      
| NULL                        |
> | LastAccessTime:               | UNKNOWN                                           
| NULL                        |
> | Retention:                    | 0                                                 
| NULL                        |
> | Location:                     | file:/warehouse/hdfs_1 | NULL                     
  |
> | Table Type:                   | MANAGED_TABLE                                     
| NULL                        |
> | Table Parameters:             | NULL                                              
| NULL                        |
> |                               | COLUMN_STATS_ACCURATE                             
| {\"BASIC_STATS\":\"true\"}  |
> |                               | numFiles                                          
| 0                           |
> |                               | numRows                                           
| 0                           |
> |                               | rawDataSize                                       
| 0                           |
> |                               | totalSize                                         
| 0                           |
> |                               | transient_lastDdlTime                             
| 1490231359                  |
> |                               | NULL                                              
| NULL                        |
> | # Storage Information         | NULL                                              
| NULL                        |
> | SerDe Library:                | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
| NULL                        |
> | InputFormat:                  | org.apache.hadoop.mapred.TextInputFormat          
| NULL                        |
> | OutputFormat:                 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
| NULL                        |
> | Compressed:                   | No                                                
| NULL                        |
> | Num Buckets:                  | -1                                                
| NULL                        |
> | Bucket Columns:               | []                                                
| NULL                        |
> | Sort Columns:                 | []                                                
| NULL                        |
> | Storage Desc Params:          | NULL                                              
| NULL                        |
> |                               | serialization.format                              
| 1                           |
> +-------------------------------+----------------------------------------------------+-----------------------------+
> 0: jdbc:hive2://localhost:10000> create table s3_1 (col int) location 's3a://[bucket]/test-tables/s3-1';
> 0: jdbc:hive2://localhost:10000> describe formatted s3_1;
> +-------------------------------+----------------------------------------------------+-----------------------+
> |           col_name            |                     data_type                     
|        comment        |
> +-------------------------------+----------------------------------------------------+-----------------------+
> | # col_name                    | data_type                                         
| comment               |
> |                               | NULL                                              
| NULL                  |
> | col                           | int                                               
|                       |
> |                               | NULL                                              
| NULL                  |
> | # Detailed Table Information  | NULL                                              
| NULL                  |
> | Database:                     | default                                           
| NULL                  |
> | Owner:                        | anonymous                                         
| NULL                  |
> | CreateTime:                   | Wed Mar 22 18:10:01 PDT 2017                      
| NULL                  |
> | LastAccessTime:               | UNKNOWN                                           
| NULL                  |
> | Retention:                    | 0                                                 
| NULL                  |
> | Location:                     | s3a://[bucket]/test-tables/s3-1     | NULL        
         |
> | Table Type:                   | MANAGED_TABLE                                     
| NULL                  |
> | Table Parameters:             | NULL                                              
| NULL                  |
> |                               | transient_lastDdlTime                             
| 1490231401            |
> |                               | NULL                                              
| NULL                  |
> | # Storage Information         | NULL                                              
| NULL                  |
> | SerDe Library:                | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
| NULL                  |
> | InputFormat:                  | org.apache.hadoop.mapred.TextInputFormat          
| NULL                  |
> | OutputFormat:                 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
| NULL                  |
> | Compressed:                   | No                                                
| NULL                  |
> | Num Buckets:                  | -1                                                
| NULL                  |
> | Bucket Columns:               | []                                                
| NULL                  |
> | Sort Columns:                 | []                                                
| NULL                  |
> | Storage Desc Params:          | NULL                                              
| NULL                  |
> |                               | serialization.format                              
| 1                     |
> +-------------------------------+----------------------------------------------------+-----------------------+
> {code}
> There are no stats defined in the describe for the s3 table. Furthermore, when inserting
into the s3 table the {{numRows}} stats are not collected for the s3 table.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message