hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HAMA-750) Determine the path of partition files
Date Fri, 26 Apr 2013 20:32:18 GMT

    [ https://issues.apache.org/jira/browse/HAMA-750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643221#comment-13643221
] 

Edward J. Yoon commented on HAMA-750:
-------------------------------------

{code}
OK. MRQL works fine now with Hama 0.7.0 in distributed mode.
I haven't tested it on a real cluster yet.
I am attaching the output from pagerank.
By the way, Hama 0.7.0 runs 2 jobs for each BSPjob, although the first is fast.
Is this done to distribute the data among peers?
Leonidas

13/04/26 10:13:50 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log)
via org.mortbay.log.Slf4jLog
*** Using 8 BSP tasks (out of a max 8). Each task will handle about 2525538 bytes of input
data.
13/04/26 10:13:50 INFO bsp.FileInputFormat: Total input paths to process : 1
13/04/26 10:13:50 INFO bsp.FileInputFormat: Total input paths to process : 1
13/04/26 10:13:50 INFO bsp.BSPJobClient: Running job: job_201304260948_0020
13/04/26 10:13:53 INFO bsp.BSPJobClient: Current supersteps number: 0
13/04/26 10:14:02 INFO bsp.BSPJobClient: Current supersteps number: 2
13/04/26 10:14:05 INFO bsp.BSPJobClient: The total number of supersteps: 2
13/04/26 10:14:05 INFO bsp.BSPJobClient: Counters: 6
13/04/26 10:14:05 INFO bsp.BSPJobClient: org.apache.hama.bsp.JobInProgress$JobCounter
13/04/26 10:14:05 INFO bsp.BSPJobClient:     SUPERSTEPS=2
13/04/26 10:14:05 INFO bsp.BSPJobClient:     LAUNCHED_TASKS=1
13/04/26 10:14:05 INFO bsp.BSPJobClient: org.apache.hama.bsp.BSPPeerImpl$PeerCounter
13/04/26 10:14:05 INFO bsp.BSPJobClient:     SUPERSTEP_SUM=2
13/04/26 10:14:05 INFO bsp.BSPJobClient:     TIME_IN_SYNC_MS=178
13/04/26 10:14:05 INFO bsp.BSPJobClient: IO_BYTES_READ=20204222
13/04/26 10:14:05 INFO bsp.BSPJobClient: TASK_INPUT_RECORDS=918362
13/04/26 10:14:05 INFO bsp.FileInputFormat: Total input paths to process : 8
13/04/26 10:14:06 INFO bsp.BSPJobClient: Running job: job_201304260948_0019
13/04/26 10:14:09 INFO bsp.BSPJobClient: Current supersteps number: 0
13/04/26 10:14:18 INFO bsp.BSPJobClient: Current supersteps number: 2
13/04/26 10:14:30 INFO bsp.BSPJobClient: Current supersteps number: 3
13/04/26 10:14:33 INFO bsp.BSPJobClient: Current supersteps number: 4
13/04/26 10:14:36 INFO bsp.BSPJobClient: Current supersteps number: 5
13/04/26 10:14:42 INFO bsp.BSPJobClient: Current supersteps number: 6
13/04/26 10:14:45 INFO bsp.BSPJobClient: Current supersteps number: 8
13/04/26 10:14:54 INFO bsp.BSPJobClient: Current supersteps number: 11
13/04/26 10:15:03 INFO bsp.BSPJobClient: Current supersteps number: 14
13/04/26 10:15:12 INFO bsp.BSPJobClient: Current supersteps number: 18
13/04/26 10:15:15 INFO bsp.BSPJobClient: Current supersteps number: 19
13/04/26 10:15:15 INFO bsp.BSPJobClient: The total number of supersteps: 19
13/04/26 10:15:15 INFO bsp.BSPJobClient: Counters: 9
13/04/26 10:15:15 INFO bsp.BSPJobClient: org.apache.hama.bsp.JobInProgress$JobCounter
13/04/26 10:15:15 INFO bsp.BSPJobClient:     SUPERSTEPS=19
13/04/26 10:15:15 INFO bsp.BSPJobClient:     LAUNCHED_TASKS=8
13/04/26 10:15:15 INFO bsp.BSPJobClient: org.apache.hama.bsp.BSPPeerImpl$PeerCounter
13/04/26 10:15:15 INFO bsp.BSPJobClient:     SUPERSTEP_SUM=152
13/04/26 10:15:15 INFO bsp.BSPJobClient: TIME_IN_SYNC_MS=132721
13/04/26 10:15:15 INFO bsp.BSPJobClient: IO_BYTES_READ=22986388
13/04/26 10:15:15 INFO bsp.BSPJobClient: TOTAL_MESSAGES_SENT=5694804
13/04/26 10:15:15 INFO bsp.BSPJobClient: TASK_INPUT_RECORDS=918362
13/04/26 10:15:15 INFO bsp.BSPJobClient:     COMPRESSED_MESSAGES=8
13/04/26 10:15:15 INFO bsp.BSPJobClient: TOTAL_MESSAGES_RECEIVED=5694804
{code}

Works well now. I'll commit today.
                
> Determine the path of partition files
> -------------------------------------
>
>                 Key: HAMA-750
>                 URL: https://issues.apache.org/jira/browse/HAMA-750
>             Project: Hama
>          Issue Type: Bug
>          Components: bsp core
>            Reporter: Edward J. Yoon
>            Assignee: Edward J. Yoon
>             Fix For: 0.7.0
>
>         Attachments: HAMA-750.patch, HAMA-750_v02.patch
>
>
> The parent directory of input file was used to determine the path of base directory for
partition files. The problem is when input is multiple files.
> {code}
>   protected BSPJob partition(BSPJob job, int maxTasks) throws IOException {
>     String inputPath = job.getConfiguration().get(Constants.JOB_INPUT_DIR);
>     Path inputDir = new Path(inputPath);
>     if (fs.isFile(inputDir)) {
>       inputDir = inputDir.getParent();
>     }
>     Path partitionDir = new Path(inputDir + "/partitions");
>     if (fs.exists(partitionDir)) {
>       fs.delete(partitionDir, true);
>     }
> {code}
> Simply we can create partitions on temp directory. For example, /tmp/hama-partitions/{$JOB_NAME}/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message