hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-907) Sqoop should use more intelligent splits
Date Mon, 14 Sep 2009 20:10:58 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755169#action_12755169

Todd Lipcon commented on MAPREDUCE-907:

Some small comments:

# Why does --all-tables ignore --split-by? (in ImportOptions usage printout) It seems like
you should be able to at least determine the PK of each table, though maybe that is future
# In AutoProgressMapper, why not just mark keepGoing as a volatile boolean and avoid the more
complicated synchronized-based barriers?
# It seems like the sleepInterval is a bit extraneous - why not just sleep for max(reportInterval/3,
1000) or something? In fact, given that it does no actual work between sleeps, you might as
well just always sleep for reportInterval - context.progress is essentially a boolean twiddle.
# Given that you expect mappers to extend AutoProgressMapper, you may want to mark run() as
final. Alternatively, you could make AutoProgressMapper wrap another Mapper implementation
and take in the class name by a conf key - that way you avoid fragility with mappers that
want to use their own run() method.
#  job.getConfiguration().set("mapred.jar", ormJarFile) doesn't seem that great - there's
already a Job.setJarByClass -- may as well add Job.setJar in mapreduce.Job
# Same goes for a couple other mapred confs - mapred.output.value.class, mapred.map.tasks

Aside from these minor things, +1.

> Sqoop should use more intelligent splits
> ----------------------------------------
>                 Key: MAPREDUCE-907
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-907
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-907.patch
> Sqoop should use the new split generation / InputFormat in MAPREDUCE-885

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message