hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5759) IllegalArgumentException when CombineFileInputFormat is used as job InputFormat
Date Sun, 24 May 2009 08:04:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712524#action_12712524
] 

dhruba borthakur commented on HADOOP-5759:
------------------------------------------

This patch keeps the original aim of combining blocks from different hosts in the same rack
into a single split. The fix that that patch attempts is to figure out where such a combined
split should reside.


> Since only hosts that actually contain the valid blocks are returned in getMoreSplits
with this patch,

I agree with Jothi to a certain extent. The number ofsplits remain the same before, but the
possibility of scheduling them on the rack where they reside is slightly reduced because we
look only at those hosts where this block belongs. It is possible to enhance this patch to
create  a new data strcture called rackToNodes at the very beginning. It can be populated
by iterating through all the blocks at the very beginning. 

> IllegalArgumentException when CombineFileInputFormat is used as job InputFormat
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5759
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5759
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.21.0
>
>         Attachments: patch-5759.txt
>
>
> As per my understanding, CombineFileInputFormat is creating splits with rackname as split
location. 
> When I use CombineFileInputFormat as the InputFormat for job, job initialization fails
with following exception :
> 2009-04-28 14:10:40,162 ERROR mapred.EagerTaskInitializationListener (EagerTaskInitializationListener.java:run(83))
- Job initialization failed:
> java.lang.IllegalArgumentException: Network location name contains /: /default-rack
>   at org.apache.hadoop.net.NodeBase.set(NodeBase.java:76)
>   at org.apache.hadoop.net.NodeBase.<init>(NodeBase.java:57)
>   at org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2342)
>   at org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2336)
>   at org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:344)
>   at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:441)
>   at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:81)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
>   at java.lang.Thread.run(Thread.java:619)
> When I changed CombineFileInputFormat to pass just rackname (without '/'), JT wrongly
resolves  the node as /default-rack/<rack-name>.
> Solution is to pass hostnames holding the block(on the rack),  instead of rackname.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message