hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sahil Takiar (JIRA)" <>
Subject [jira] [Created] (HIVE-15396) Basic Stats are not collected when running INSERT INTO commands on s3a
Date Fri, 09 Dec 2016 00:02:59 GMT
Sahil Takiar created HIVE-15396:

             Summary: Basic Stats are not collected when running INSERT INTO commands on s3a
                 Key: HIVE-15396
             Project: Hive
          Issue Type: Bug
          Components: Hive
            Reporter: Sahil Takiar
            Assignee: Sahil Takiar

{{numRows}} is not collected when running {{INSERT ... INTO ...}} commands against tables
backed by S3 (and maybe even other blobstores).

The {{COLUMN_STATS_ACCURATE={"BASIC_STATS":"true"}}} entry is missing from the {{describe
extended}} output.

Repro steps:

hive> drop table s3_table;
Time taken: 1.87 seconds
hive> create table s3_table (col int) location 's3a://[bucket-name]/stats-test/';
Time taken: 3.069 seconds
hive> insert into s3_table values (1), (2), (3);
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions.
Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = stakiar_20161208160105_fb3df340-d5fb-4ad6-8776-4f3cae02216d
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2016-12-08 16:01:12,741 Stage-1 map = 0%,  reduce = 0%
2016-12-08 16:01:16,759 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local688636529_0004
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Loading data to table default.s3_table
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
Time taken: 23.0 seconds
hive> select * from s3_table;
Time taken: 0.096 seconds, Fetched: 3 row(s)
hive> describe extended s3_table;
col                 	int

Detailed Table Information	Table(tableName:s3_table, dbName:default, owner:stakiar, createTime:1481241657,
lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col, type:int,
comment:null)], location:s3a://cloudera-dev-hive-on-s3/stats-test, inputFormat:org.apache.hadoop.mapred.TextInputFormat,, compressed:false,
numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[],
skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[],
parameters:{transient_lastDdlTime=1481241687, totalSize=6, numFiles=1}, viewOriginalText:null,
viewExpandedText:null, tableType:MANAGED_TABLE)
Time taken: 0.037 seconds, Fetched: 3 row(s)

This message was sent by Atlassian JIRA

View raw message