hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HAMA-413) Remove limitation on the number of tasks
Date Tue, 23 Aug 2011 06:33:31 GMT

    [ https://issues.apache.org/jira/browse/HAMA-413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089294#comment-13089294
] 

Edward J. Yoon commented on HAMA-413:
-------------------------------------

Below is the results on 16 physical nodes.

{code}
JobClient LOG:
11/08/23 15:27:57 DEBUG bsp.BSPJobClient: BSPJobClient.submitJobDir: hdfs://hnode15:9000/tmp/hadoop-root/bsp/system/submit_22he6c
11/08/23 15:27:58 INFO bsp.BSPJobClient: Running job: job_201108231527_0001
11/08/23 15:28:01 INFO bsp.BSPJobClient: Current supersteps number: 0
11/08/23 15:28:22 INFO bsp.BSPJobClient: The total number of supersteps: 0
java.io.FileNotFoundException: File does not exist: /tmp/pi-example/output
        at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:457)
        at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:676)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
        at org.apache.hama.examples.PiEstimator.printOutput(PiEstimator.java:109)
        at org.apache.hama.examples.PiEstimator.main(PiEstimator.java:151)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.hama.examples.ExampleDriver.main(ExampleDriver.java:37)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hama.util.RunJar.main(RunJar.java:145)

----
LOG of node16 groomserver:

2011-08-23 15:28:02,743 INFO org.apache.hama.bsp.TaskRunner: attempt_201108231527_0001_000025_0
11/08/23 15:28:02 WARN bsp.GroomServer: Error running child
2011-08-23 15:28:02,744 INFO org.apache.hama.bsp.TaskRunner: attempt_201108231527_0001_000025_0
java.lang.NullPointerException
2011-08-23 15:28:02,744 INFO org.apache.hama.bsp.TaskRunner: attempt_201108231527_0001_000025_0
        at org.apache.hama.bsp.BSPPeer.send(BSPPeer.java:167)
2011-08-23 15:28:02,744 INFO org.apache.hama.bsp.TaskRunner: attempt_201108231527_0001_000025_0
        at org.apache.hama.examples.PiEstimator$MyEstimator.bsp(PiEstimator.java:64)
2011-08-23 15:28:02,744 INFO org.apache.hama.bsp.TaskRunner: attempt_201108231527_0001_000025_0
        at org.apache.hama.bsp.BSPTask.run(BSPTask.java:60)
2011-08-23 15:28:02,744 INFO org.apache.hama.bsp.TaskRunner: attempt_201108231527_0001_000025_0
        at org.apache.hama.bsp.GroomServer$Child.main(GroomServer.java:875)
2011-08-23 15:28:02,744 INFO org.apache.hama.bsp.TaskRunner: attempt_201108231527_0001_000025_0
11/08/23 15:28:02 INFO ipc.Server: Stopping server on 61001
2011-08-23 15:28:02,744 INFO org.apache.hama.bsp.TaskRunner: attempt_201108231527_0001_000025_0
11/08/23 15:28:02 INFO ipc.Server: IPC Server handler 0 on 61001: exiting
2011-08-23 15:28:02,744 INFO org.apache.hama.bsp.TaskRunner: attempt_201108231527_0001_000025_0
11/08/23 15:28:02 INFO ipc.Server: Stopping IPC Server listener on 61001
2011-08-23 15:28:02,744 INFO org.apache.hama.bsp.TaskRunner: attempt_201108231527_0001_000025_0
11/08/23 15:28:02 INFO ipc.Server: Stopping IPC Server Responder
2011-08-23 15:28:02,764 INFO org.apache.hama.bsp.TaskRunner: attempt_201108231527_0001_000039_0
11/08/23 15:28:02 INFO ipc.Server: IPC Server Responder: starting
2011-08-23 15:28:02,764 INFO org.apache.hama.bsp.TaskRunner: attempt_201108231527_0001_000039_0
11/08/23 15:28:02 INFO ipc.Server: IPC Server listener on 61002: starting
...

2011-08-23 15:28:02,951 INFO org.apache.hama.bsp.TaskRunner: attempt_201108231527_0001_000039_0
11/08/23 15:28:02 INFO ipc.Server: Stopping server on 61002
2011-08-23 15:28:02,951 INFO org.apache.hama.bsp.TaskRunner: attempt_201108231527_0001_000039_0
11/08/23 15:28:02 INFO ipc.Server: IPC Server handler 0 on 61002: exiting
2011-08-23 15:28:02,951 INFO org.apache.hama.bsp.TaskRunner: attempt_201108231527_0001_000039_0
11/08/23 15:28:02 INFO ipc.Server: Stopping IPC Server listener on 61002
2011-08-23 15:28:02,951 INFO org.apache.hama.bsp.TaskRunner: attempt_201108231527_0001_000039_0
11/08/23 15:28:02 INFO ipc.Server: Stopping IPC Server Responder
2011-08-23 15:28:03,306 INFO org.apache.hama.bsp.GroomServer: Lost connection to BSP Master
[hnode1/10.33.1.101:40000].  Retrying...
java.util.ConcurrentModificationException
        at java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:373)
        at java.util.LinkedHashMap$EntryIterator.next(LinkedHashMap.java:392)
        at java.util.LinkedHashMap$EntryIterator.next(LinkedHashMap.java:391)
        at org.apache.hama.bsp.GroomServer.offerService(GroomServer.java:394)
        at org.apache.hama.bsp.GroomServer.run(GroomServer.java:634)
        at java.lang.Thread.run(Thread.java:662)
{code}



> Remove limitation on the number of tasks
> ----------------------------------------
>
>                 Key: HAMA-413
>                 URL: https://issues.apache.org/jira/browse/HAMA-413
>             Project: Hama
>          Issue Type: Sub-task
>          Components: bsp
>    Affects Versions: 0.3.0
>            Reporter: Edward J. Yoon
>            Assignee: Edward J. Yoon
>             Fix For: 0.4.0
>
>         Attachments: HAMA-413_v01.patch
>
>
> By HAMA-410 patch, BSPPeer object will be constructed at child process. Now we can just
remove limitation on the number of tasks.
> Here's TODO list:
> 1. The number of tasks per groom should be configurable e.g., 'bsp.local.tasks.maximum'.
> 2. The 'totalTaskCapacity' should be calculated at BSPMaster.getClusterStatus().
> 3. When scheduling tasks, consider how to allocate them.
> 4. Each BSPPeer should know all created peers of Hama cluster by job. It can be listed
based on actions of GroomServer.
> 5. In examples, 'cluster.getGroomServers()' can be changed to 'cluster.getMaxTasks()'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message