hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-629) concat files needed for map-reduce jobs also
Date Wed, 15 Jul 2009 06:21:15 GMT

    [ https://issues.apache.org/jira/browse/HIVE-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731297#action_12731297

Zheng Shao commented on HIVE-629:

Three more questions:

1. How do we determine the number of reducers of the merge job? Is that based on "hive.exec.reducers.bytes.per.reducer"?
2. How do we create the additional map-reduce job? Do we copy the cluster key or distribution
key in the last job? If so, what if the keys are not available after the reducer?
3. For default value, do we want to enable both (map, map-reduce), but set the threshold to
64MB or smaller like 16MB? So most users won't see a change at all, but people who are producing
extremely small files (those are the people who wants this feature) will see the files concatenated?

> concat files needed for map-reduce jobs also
> --------------------------------------------
>                 Key: HIVE-629
>                 URL: https://issues.apache.org/jira/browse/HIVE-629
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.4.0
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: hive.629.1.patch
> Currently, hive concatenates files only if the job under consideration is a map-only
> I got some requests from some users, where they want this behavior for map-reduce jobs
also - it may not be a good idea to turn it on by default.
> But, we should provide an option to the user where the concatenation can happen even
for map-reduce jobs.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message