hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1463) hive output file names are unnecessarily large
Date Fri, 16 Jul 2010 21:08:50 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889328#action_12889328

Joydeep Sen Sarma commented on HIVE-1463:

thanks for the review.

1) I checked this out. hadoop from 17 onwards always uses <prefix>_<jtid>_[mr]_<taskid>_<attemptid>.
in 17 - prefix was 'task'. in 18 and later prefix was changed to 'attempt'. jt = 'local' for
local mode. otherwise there's no difference between local and regular jobs.

   i think 15 was different (where hive was initially started) - that's why there were comments
to the effect that jobs have _map_ in local mode.

  one thing i can do is add tests under shim to make sure of this. if i am unable to add a
test - i will at least confirm for sure the naming under 17.

2) good point!  dropping the leading prefix is not necessary (since repeated strings are factored
out by hdfs now - it uses String.intern()). i can take that part out.

will upload modified diff.

> hive output file names are unnecessarily large
> ----------------------------------------------
>                 Key: HIVE-1463
>                 URL: https://issues.apache.org/jira/browse/HIVE-1463
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Joydeep Sen Sarma
>         Attachments: hive-1463.1.patch
> Hive's output files are named like this:
> attempt_201006221843_431854_r_000000_0
> out of all of this goop - only one character '0' would have sufficed. we should fix this.
This would help environments with namenode memory constraints.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message