hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Elliot West <tea...@gmail.com>
Subject Re: tez + union stmt
Date Wed, 11 Jan 2017 07:05:00 GMT
Thanks Rohini,

This is good to know. Could you perhaps raise an issue in the Hive JIRA?

Thanks,

Elliot.

On Tue, 10 Jan 2017 at 22:55, Rohini Palaniswamy <rohini.aditya@gmail.com>
wrote:

> The implementation in hive does look wrong. The concept of VertexGroups
> was added in Tez specifically for the case of union to support writing to
> same directory from different vertices. Sub-directories should not be
> required as a workaround.
>
> Regards,
> Rohini
>
>
> On Sun, Dec 25, 2016 at 10:58 AM, Stephen Sprague <spragues@gmail.com>
> wrote:
>
> Thanks Elliot.  Nice christmas present.   Those settings in that
> stackoverflow link look to me to be exactly what i need to set for MR jobs
> to pick that data up that Tez created.
>
> Cheers,
> Stephen.
>
> On Sun, Dec 25, 2016 at 2:45 AM, Elliot West <teabot@gmail.com> wrote:
>
> I believe that tez will generate subfolders for unioned data. As far as I
> know, this is the expected behaviour and there is no alternative.
> Presumably this is to prevent multiple tasks from attempting to write the
> same file?
>
> We've experienced issues when switching from mr to tez; downstream jobs
> weren't expecting subfolders and had trouble reading previously accessible
> datasets.
>
> Apparently there are workarounds within Hive:
>
> http://stackoverflow.com/questions/39511585/hive-create-table-not-insert-data
>
> Merry Christmas,
>
> Elliot.
>
> On Sun, 25 Dec 2016 at 03:11, Rajesh Balamohan <rbalamohan@apache.org>
> wrote:
>
> Are there any exceptions in hive.log?. Is tmp_pv_v4* table part of the
> select query?
>
> Assuming you are creating the table in staging.db, it would have created
> the table location as staging.db/foo (as you have not specified the
> location).
>
> Adding user@hive.apache.org as this is hive related.
>
>
> ~Rajesh.B
>
> On Sun, Dec 25, 2016 at 12:08 AM, Stephen Sprague <spragues@gmail.com>
> wrote:
>
> all,
>
> i'm running tez with the sql pattern:
>
>     * create table foo as select * from (select... UNION select... UNION
> select...)
>
> in the logs the final step is this:
>
>     * Moving data to directory hdfs://
> dwrnn1.sv2.trulia.com:8020/user/hive/warehouse/staging.db/tmp_pv_v4c__loc_4
> from hdfs://
> dwrnn1.sv2.trulia.com:8020/user/hive/warehouse/staging.db/.hive-staging_hive_2016-12-24_10-05-40_048_4896412314807355668-899/-ext-10002
>
>
> when querying the table i got zero rows returned which made me curious. so
> i queried the hdfs location and see this:
>
>   $ hdfs dfs -ls hdfs://
> dwrnn1.sv2.trulia.com:8020/user/hive/warehouse/staging.db/tmp_pv_v4c__loc_4
>
>   Found 3 items
>   drwxrwxrwx   - dwr supergroup          0 2016-12-24 10:05 hdfs://
> dwrnn1.sv2.trulia.com:8020/user/hive/warehouse/staging.db/tmp_pv_v4c__loc_4/1
>   drwxrwxrwx   - dwr supergroup          0 2016-12-24 10:06 hdfs://
> dwrnn1.sv2.trulia.com:8020/user/hive/warehouse/staging.db/tmp_pv_v4c__loc_4/2
>   drwxrwxrwx   - dwr supergroup          0 2016-12-24 10:06 hdfs://
> dwrnn1.sv2.trulia.com:8020/user/hive/warehouse/staging.db/tmp_pv_v4c__loc_4/3
>
> and yes the data files are under these three dirs.
>
> so i ask... i'm not used to seeing sub-directories under the tablename
> unless the table is partitioned. is this legit? might there be some config
> settings i need to set to see this data via sql?
>
> thanks,
> Stephen.
>
>
>
>
>
>
>
>
>
>
>
>
>
>

Mime
View raw message