hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sahil Takiar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified
Date Mon, 24 Apr 2017 20:31:04 GMT

    [ https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981840#comment-15981840
] 

Sahil Takiar commented on HIVE-15396:
-------------------------------------

[~pxiong] wanted to see if we can still get this patch in. Let me know what you think of the
most recent patch. To summarize:

* The patch added basic stats collection for table with a {{LOCATION}} specified, but only
if the specified location is empty and the table is not an external table
* This should be useful when running on blobstores such as S3, where users commonly specify
an explicit {{LOCATION}} clause

Thanks for spending the time to look at this!

> Basic Stats are not collected when for managed tables with LOCATION specified
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-15396
>                 URL: https://issues.apache.org/jira/browse/HIVE-15396
>             Project: Hive
>          Issue Type: Bug
>          Components: Statistics
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>         Attachments: HIVE-15396.1.patch, HIVE-15396.2.patch, HIVE-15396.3.patch, HIVE-15396.4.patch,
HIVE-15396.5.patch, HIVE-15396.6.patch, HIVE-15396.7.patch
>
>
> Basic stats are not collected when a managed table is created with a specified {{LOCATION}}
clause.
> {code}
> 0: jdbc:hive2://localhost:10000> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:10000> describe formatted hdfs_1;
> +-------------------------------+----------------------------------------------------+-----------------------------+
> |           col_name            |                     data_type                     
|           comment           |
> +-------------------------------+----------------------------------------------------+-----------------------------+
> | # col_name                    | data_type                                         
| comment                     |
> |                               | NULL                                              
| NULL                        |
> | col                           | int                                               
|                             |
> |                               | NULL                                              
| NULL                        |
> | # Detailed Table Information  | NULL                                              
| NULL                        |
> | Database:                     | default                                           
| NULL                        |
> | Owner:                        | anonymous                                         
| NULL                        |
> | CreateTime:                   | Wed Mar 22 18:09:19 PDT 2017                      
| NULL                        |
> | LastAccessTime:               | UNKNOWN                                           
| NULL                        |
> | Retention:                    | 0                                                 
| NULL                        |
> | Location:                     | file:/warehouse/hdfs_1 | NULL                     
  |
> | Table Type:                   | MANAGED_TABLE                                     
| NULL                        |
> | Table Parameters:             | NULL                                              
| NULL                        |
> |                               | COLUMN_STATS_ACCURATE                             
| {\"BASIC_STATS\":\"true\"}  |
> |                               | numFiles                                          
| 0                           |
> |                               | numRows                                           
| 0                           |
> |                               | rawDataSize                                       
| 0                           |
> |                               | totalSize                                         
| 0                           |
> |                               | transient_lastDdlTime                             
| 1490231359                  |
> |                               | NULL                                              
| NULL                        |
> | # Storage Information         | NULL                                              
| NULL                        |
> | SerDe Library:                | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
| NULL                        |
> | InputFormat:                  | org.apache.hadoop.mapred.TextInputFormat          
| NULL                        |
> | OutputFormat:                 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
| NULL                        |
> | Compressed:                   | No                                                
| NULL                        |
> | Num Buckets:                  | -1                                                
| NULL                        |
> | Bucket Columns:               | []                                                
| NULL                        |
> | Sort Columns:                 | []                                                
| NULL                        |
> | Storage Desc Params:          | NULL                                              
| NULL                        |
> |                               | serialization.format                              
| 1                           |
> +-------------------------------+----------------------------------------------------+-----------------------------+
> 0: jdbc:hive2://localhost:10000> create table s3_1 (col int) location 's3a://[bucket]/test-tables/s3-1';
> 0: jdbc:hive2://localhost:10000> describe formatted s3_1;
> +-------------------------------+----------------------------------------------------+-----------------------+
> |           col_name            |                     data_type                     
|        comment        |
> +-------------------------------+----------------------------------------------------+-----------------------+
> | # col_name                    | data_type                                         
| comment               |
> |                               | NULL                                              
| NULL                  |
> | col                           | int                                               
|                       |
> |                               | NULL                                              
| NULL                  |
> | # Detailed Table Information  | NULL                                              
| NULL                  |
> | Database:                     | default                                           
| NULL                  |
> | Owner:                        | anonymous                                         
| NULL                  |
> | CreateTime:                   | Wed Mar 22 18:10:01 PDT 2017                      
| NULL                  |
> | LastAccessTime:               | UNKNOWN                                           
| NULL                  |
> | Retention:                    | 0                                                 
| NULL                  |
> | Location:                     | s3a://[bucket]/test-tables/s3-1     | NULL        
         |
> | Table Type:                   | MANAGED_TABLE                                     
| NULL                  |
> | Table Parameters:             | NULL                                              
| NULL                  |
> |                               | transient_lastDdlTime                             
| 1490231401            |
> |                               | NULL                                              
| NULL                  |
> | # Storage Information         | NULL                                              
| NULL                  |
> | SerDe Library:                | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
| NULL                  |
> | InputFormat:                  | org.apache.hadoop.mapred.TextInputFormat          
| NULL                  |
> | OutputFormat:                 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
| NULL                  |
> | Compressed:                   | No                                                
| NULL                  |
> | Num Buckets:                  | -1                                                
| NULL                  |
> | Bucket Columns:               | []                                                
| NULL                  |
> | Sort Columns:                 | []                                                
| NULL                  |
> | Storage Desc Params:          | NULL                                              
| NULL                  |
> |                               | serialization.format                              
| 1                     |
> +-------------------------------+----------------------------------------------------+-----------------------+
> {code}
> There are no stats defined in the describe for the s3 table. Furthermore, when inserting
into the s3 table the {{numRows}} stats are not collected for the s3 table.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message