hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Sprague <sprag...@gmail.com>
Subject Re: tez + union stmt
Date Sun, 25 Dec 2016 18:58:05 GMT
Thanks Elliot.  Nice christmas present.   Those settings in that
stackoverflow link look to me to be exactly what i need to set for MR jobs
to pick that data up that Tez created.

Cheers,
Stephen.

On Sun, Dec 25, 2016 at 2:45 AM, Elliot West <teabot@gmail.com> wrote:

> I believe that tez will generate subfolders for unioned data. As far as I
> know, this is the expected behaviour and there is no alternative.
> Presumably this is to prevent multiple tasks from attempting to write the
> same file?
>
> We've experienced issues when switching from mr to tez; downstream jobs
> weren't expecting subfolders and had trouble reading previously accessible
> datasets.
>
> Apparently there are workarounds within Hive:
> http://stackoverflow.com/questions/39511585/hive-
> create-table-not-insert-data
>
> Merry Christmas,
>
> Elliot.
>
> On Sun, 25 Dec 2016 at 03:11, Rajesh Balamohan <rbalamohan@apache.org>
> wrote:
>
>> Are there any exceptions in hive.log?. Is tmp_pv_v4* table part of the
>> select query?
>>
>> Assuming you are creating the table in staging.db, it would have created
>> the table location as staging.db/foo (as you have not specified the
>> location).
>>
>> Adding user@hive.apache.org as this is hive related.
>>
>>
>> ~Rajesh.B
>>
>> On Sun, Dec 25, 2016 at 12:08 AM, Stephen Sprague <spragues@gmail.com>
>> wrote:
>>
>> all,
>>
>> i'm running tez with the sql pattern:
>>
>>     * create table foo as select * from (select... UNION select... UNION
>> select...)
>>
>> in the logs the final step is this:
>>
>>     * Moving data to directory hdfs://dwrnn1.sv2.trulia.com:
>> 8020/user/hive/warehouse/staging.db/tmp_pv_v4c__loc_4 from hdfs://
>> dwrnn1.sv2.trulia.com:8020/user/hive/warehouse/
>> staging.db/.hive-staging_hive_2016-12-24_10-05-40_048_
>> 4896412314807355668-899/-ext-10002
>>
>>
>> when querying the table i got zero rows returned which made me curious.
>> so i queried the hdfs location and see this:
>>
>>   $ hdfs dfs -ls hdfs://dwrnn1.sv2.trulia.com:8020/user/hive/warehouse/
>> staging.db/tmp_pv_v4c__loc_4
>>
>>   Found 3 items
>>   drwxrwxrwx   - dwr supergroup          0 2016-12-24 10:05 hdfs://
>> dwrnn1.sv2.trulia.com:8020/user/hive/warehouse/
>> staging.db/tmp_pv_v4c__loc_4/1
>>   drwxrwxrwx   - dwr supergroup          0 2016-12-24 10:06 hdfs://
>> dwrnn1.sv2.trulia.com:8020/user/hive/warehouse/
>> staging.db/tmp_pv_v4c__loc_4/2
>>   drwxrwxrwx   - dwr supergroup          0 2016-12-24 10:06 hdfs://
>> dwrnn1.sv2.trulia.com:8020/user/hive/warehouse/
>> staging.db/tmp_pv_v4c__loc_4/3
>>
>> and yes the data files are under these three dirs.
>>
>> so i ask... i'm not used to seeing sub-directories under the tablename
>> unless the table is partitioned. is this legit? might there be some config
>> settings i need to set to see this data via sql?
>>
>> thanks,
>> Stephen.
>>
>>
>>
>>
>>
>>
>>
>>

Mime
View raw message