hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Elliot West <tea...@gmail.com>
Subject Re: tez + union stmt
Date Sun, 25 Dec 2016 10:45:20 GMT
I believe that tez will generate subfolders for unioned data. As far as I
know, this is the expected behaviour and there is no alternative.
Presumably this is to prevent multiple tasks from attempting to write the
same file?

We've experienced issues when switching from mr to tez; downstream jobs
weren't expecting subfolders and had trouble reading previously accessible
datasets.

Apparently there are workarounds within Hive:
http://stackoverflow.com/questions/39511585/hive-create-table-not-insert-data

Merry Christmas,

Elliot.

On Sun, 25 Dec 2016 at 03:11, Rajesh Balamohan <rbalamohan@apache.org>
wrote:

> Are there any exceptions in hive.log?. Is tmp_pv_v4* table part of the
> select query?
>
> Assuming you are creating the table in staging.db, it would have created
> the table location as staging.db/foo (as you have not specified the
> location).
>
> Adding user@hive.apache.org as this is hive related.
>
>
> ~Rajesh.B
>
> On Sun, Dec 25, 2016 at 12:08 AM, Stephen Sprague <spragues@gmail.com>
> wrote:
>
> all,
>
> i'm running tez with the sql pattern:
>
>     * create table foo as select * from (select... UNION select... UNION
> select...)
>
> in the logs the final step is this:
>
>     * Moving data to directory hdfs://
> dwrnn1.sv2.trulia.com:8020/user/hive/warehouse/staging.db/tmp_pv_v4c__loc_4
> from hdfs://
> dwrnn1.sv2.trulia.com:8020/user/hive/warehouse/staging.db/.hive-staging_hive_2016-12-24_10-05-40_048_4896412314807355668-899/-ext-10002
>
>
> when querying the table i got zero rows returned which made me curious. so
> i queried the hdfs location and see this:
>
>   $ hdfs dfs -ls hdfs://
> dwrnn1.sv2.trulia.com:8020/user/hive/warehouse/staging.db/tmp_pv_v4c__loc_4
>
>   Found 3 items
>   drwxrwxrwx   - dwr supergroup          0 2016-12-24 10:05 hdfs://
> dwrnn1.sv2.trulia.com:8020/user/hive/warehouse/staging.db/tmp_pv_v4c__loc_4/1
>   drwxrwxrwx   - dwr supergroup          0 2016-12-24 10:06 hdfs://
> dwrnn1.sv2.trulia.com:8020/user/hive/warehouse/staging.db/tmp_pv_v4c__loc_4/2
>   drwxrwxrwx   - dwr supergroup          0 2016-12-24 10:06 hdfs://
> dwrnn1.sv2.trulia.com:8020/user/hive/warehouse/staging.db/tmp_pv_v4c__loc_4/3
>
> and yes the data files are under these three dirs.
>
> so i ask... i'm not used to seeing sub-directories under the tablename
> unless the table is partitioned. is this legit? might there be some config
> settings i need to set to see this data via sql?
>
> thanks,
> Stephen.
>
>
>
>
>
>
>
>

Mime
View raw message