hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Jungblut <thomas.jungb...@gmail.com>
Subject Re: Problem with input HDFS paths separated by a comma
Date Sat, 28 Jul 2012 15:51:04 GMT
So I assume that you just have 3 tasks defined in your cluster, and thus it
will not schedule your job because it has been split to four tasks.

You can see an error log in the bsp master log file, e.G.:

> 2012-07-28 17:45:34,708 ERROR org.apache.hama.bsp.SimpleTaskScheduler:
> Scheduling of job test.jar could not be done successfully. Killing it!


Unlike in MapReduce, BSP must run all tasks in parallel.
You can simply increase the task capacity by setting "bsp.tasks.maximum" in
your conf/hama-site.xml to a higher number, say 8 and restart your cluster.

If you can't, well, then you have to dig a bit deeper in the text input
format and tell it to not split the data in 4 chunks. (setting the
"bsp.min.split.size" property and such).

2012/7/28 Leonidas Fegaras <fegaras@cse.uta.edu>

>  No, there are no log entries for this job. It seems that the job failed
> before starting the logs.
> Another thing: there was a typo in the code:
> It should be job.setOutputValueClass(Text.class); in the code.
> It doesn't make a difference though.
> Best regards
> Leonidas Fegaras
>
>
>
> On 07/28/2012 10:22 AM, Thomas Jungblut wrote:
>
> Hey,
>
>  do you have some task logs (they are under
> HAMA_DIR/logs/tasklogs/_job_id/..., are there any exceptions?
>
> 2012/7/28 Leonidas Fegaras <fegaras@cse.uta.edu>
>
>> Dear fellow Hama users,
>> It seems that FileInputFormat.setInputPaths doesn't work correctly for
>> multiple HDFS paths (although it works fine in local mode using local
>> files). I am using Hama 0.5.0 on Hadoop 1.0.3.
>> I am attaching a simple code that concatenates text form files (just to
>> show the error).
>> It works fine in local mode for multiple files, it works fine in
>> pseudo-distributed mode for just one file, but it doesn't work in
>> pseudo-distributed mode on multiple HDFS files: it displays the following
>> with no entry in the log for this job:
>>
>> &&&
>> hdfs://localhost:9000/user/fegaras/orders.tbl,hdfs://localhost:9000/user/fegaras/customer.tbl
>> 12/07/28 09:42:00 INFO bsp.FileInputFormat: Total input paths to process
>> : 2
>> 12/07/28 09:42:00 INFO bsp.FileInputFormat: Total # of splits: 4
>> 12/07/28 09:42:00 INFO bsp.BSPJobClient: Running job:
>> job_201207280900_0004
>> 12/07/28 09:42:03 INFO bsp.BSPJobClient: Current supersteps number: 0
>> 12/07/28 09:42:03 INFO bsp.BSPJobClient: Job failed.
>>
>> Both paths are correct and can be accessed separately.
>> Is this a Hama error or am I doing something wrong?
>> Thanks for your help,
>> Best regards
>> Leonidas Fegaras
>> U. of Texas at Arlington
>>
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message