hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasanth J (JIRA)" <>
Subject [jira] [Commented] (HIVE-6958) update union_remove_*, other tests for hadoop-2
Date Wed, 23 Apr 2014 01:11:16 GMT


Prasanth J commented on HIVE-6958:

The reason for this failure, is related to the behaviour of UNION. INSERT queries with UNION
ALL will create sub-directories under table/partition directory. For example:
insert overwrite table outputTbl1
  SELECT key, count(1) as values from inputTbl1 group by key
  SELECT key, count(1) as values from inputTbl1 group by key
) a;

for the above query, the warehouse/outputTbl1 directory will have 2 sub-directories corresponding
to each SELECT queries like
warehouse/outputTbl1/15/, warehouse/outputTbl1/16/. Here 15 and 16 are operator identifiers

This special case (having directory under table) happens only for union insert. All other
cases will have files underneath the table directory for unpartitioned tables. But the metastore
utils for updating the fast stats are not aware of this directory structure (it expects files
underneath table directory).  The Warehouse.getFileStatusesForUnpartitionedTable() recurses
only one level under table directory if it is unpartitioned table
For union insert, if only 1 level is recursed you will get only the folder sizes and not the
actual file sizes. Folder sizes are different for different OSes. It looks like original diff
was generated using Mac OS X and the new diff was generated using Centos. Both the diffs are
*wrong* as they return folder size as opposed to file sizes. 

1) One way to fix this is to change the recurse level to a value greater than 1. 
2) Another way would be to fix UNION to create files instead of directories. To resolve filename
conflict it can append the operator id to filename.

[~ashutoshc]/[~jdere] do you guys have any thoughts about this?

> update union_remove_*, other tests for hadoop-2
> -----------------------------------------------
>                 Key: HIVE-6958
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: Tests
>            Reporter: Jason Dere
>            Assignee: Jason Dere
>         Attachments: HIVE-6958.1.patch
> Update q.out files to match totalSize for Linux platform.

This message was sent by Atlassian JIRA

View raw message