hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mithun Radhakrishnan (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-11475) Bad rename of directory during commit, when using HCat dynamic-partitioning.
Date Wed, 05 Aug 2015 22:37:05 GMT

     [ https://issues.apache.org/jira/browse/HIVE-11475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mithun Radhakrishnan updated HIVE-11475:
----------------------------------------
    Attachment: HIVE-11475.1.patch

The fix, to the regex.

(~Perf-wise, it might be cheaper just to use {{Math.random() + 0.01}}. The existing {{Pattern}}
would handle the possible overflow. It would certainly beat the cost of more regex-matching,
but that would obscure the code.~)

> Bad rename of directory during commit, when using HCat dynamic-partitioning.
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-11475
>                 URL: https://issues.apache.org/jira/browse/HIVE-11475
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog
>    Affects Versions: 1.2.0
>            Reporter: Mithun Radhakrishnan
>            Assignee: Mithun Radhakrishnan
>            Priority: Critical
>         Attachments: HIVE-11475.1.patch
>
>
> Here's one that [~knoguchi] found and root-caused. This one's a doozy. 
> Under seemingly random conditions, the temporary output (under {{_SCRATCH1.234*}}) for
HCat's dynamic partitioner isn't promoted correctly to the final table directory.
> The namenode logs indicated a botched directory-rename:
> {noformat}
> 2015-08-02 03:24:29,090 INFO FSNamesystem.audit: allowed=true ugi=myth (auth:TOKEN) via
wrkflow@GRID.MYTH.NET (auth:TOKEN) ip=/10.192.100.117 cmd=rename src=/projects/hive/myth.db/myth_table_15m/_SCRATCH2.8772158158263395E-4/tc=1/utc_time=201508020145/part-r-00000
dst=/projects/hive/myth.db/myth_table_15mE-4/tc=1/utc_time=201508020145/part-r-00000 perm=myth:madcaps:rw-r-r-
proto=rpc
> {noformat}
> Note that the table-directory name {{"myth_table_15m"}} is appended with {{"E-4"}}. This'll
break anything that uses HDFS-based polling.
> [~knoguchi] points out the following code:
> {code:title=HCatOutputFormat.java}
> 119   if ((idHash = conf.get(HCatConstants.HCAT_OUTPUT_ID_HASH)) == null) {
> 120         idHash = String.valueOf(Math.random());
> 121   }
> {code}
> {code:title=FileOutputCommitterContainer.java}
> 370       String finalLocn = jobLocation.replaceAll(Path.SEPARATOR + SCRATCH_DIR_NAME
+ "\\d\\.?\\d+","");
> {code}
> The problem is that when {{Math.random()}} produces a number <= 10 ^-3^, {{String.valueOf(double)}}
uses exponential notation. The regex doesn't capture or handle this notation.
> The fix belies the debugging-effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message