hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sushanth Sowmyan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-5011) Dynamic partitioning in HCatalog broken on external tables
Date Tue, 06 Aug 2013 20:25:47 GMT

    [ https://issues.apache.org/jira/browse/HIVE-5011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13731226#comment-13731226
] 

Sushanth Sowmyan commented on HIVE-5011:
----------------------------------------

Basic bug synopsis:

Say a table t1 partitioned by key dayofweek:string is present in location "hdfs://blah/foo/t1/".

Ordinarily, if we try to write to it specifying that we're writing a partition dayofweek="sunday",
then the location it'll write to is "hdfs://blah/foo/t1/dayofweek=sunday/".

Now, this is known before the MR jobs start, and will be set as the location, and all is good.
If the table is specified as an external table, and the user wants to specify a custom location
format for the location, such that they want "hdfs://blah/foo/t1/sunday/", then HCat Storer
currently allows them to specify that, and that will be honoured too.

That was the intent of HCATALOG-500, and the way it works for static partitioning.

With dynamic partitioning on external tables, with HCATALOG-500, however, this is what winds
up happening.

All the partitions being written to wind up having their location set as "hdfs://blah/foo/t1/dayofweek=__DEFAULT_HIVE_PARTITION__"
if no override is provided , or to "hdfs://blah/foo/t1/whatever" if that location was provided
as an override.

This results in the first partition writes from the drones writing to this location, and all
other drones not being able to open to write, stalling, getting retried, and having the job
fail. It would be possible, in theory, if there were only one reducer in the job, and all
data present in only one partition worth of writing, that the job might not fail, but that's
a highly constrained mode of writing which makes the dynamic partitioning feature itself meaningless.
                
> Dynamic partitioning in HCatalog broken on external tables
> ----------------------------------------------------------
>
>                 Key: HIVE-5011
>                 URL: https://issues.apache.org/jira/browse/HIVE-5011
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog
>            Reporter: Sushanth Sowmyan
>            Assignee: Sushanth Sowmyan
>
> Dynamic partitioning with HCatalog has been broken as a result of HCATALOG-500 trying
to support user-set paths for external tables.
> The goal there was to be able to support other custom destinations apart from the normal
"hive-style" partitions. However, it is not currently possible for users to set paths for
dynamic ptn writes, since we don't support any way for users to specify "patterns"(like, say
"$\{rootdir\}/$v1.$v2/") into which writes happen, only "locations", and the values for dyn.
partitions are not known ahead of time. Also, specifying a custom path messes with the way
dynamic ptn. code tries to determine what was written to where from the output committer,
which means that even if we supported patterned-writes instead of location-writes, we still
have to do some more deep diving into the output committer code to support it.
> Thus, my current proposal is that we honour writes to user-specified paths for external
tables *ONLY* for static partition writes - i.e., if we can determine that the write is a
dyn. ptn. write, we will ignore the user specification. (Note that this does not mean we ignore
the table's external location - we honour that - we just don't honour any HCatStorer/etc provided
additional location - we stick to what metadata tells us the root location is.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message