hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roman Shapovalov <shapova...@graphics.cs.msu.su>
Subject Re: readNext semantic and failure after cleanup
Date Tue, 01 Oct 2013 17:06:45 GMT
It seems that the file is lost in communication. Here is a copy:

https://dl.dropboxusercontent.com/u/42489708/MasterSlaveBSP.py

Roman


On Tue, Oct 1, 2013 at 6:13 PM, Roman Shapovalov
<shapovalov@graphics.cs.msu.su> wrote:
> Hi Martin,
>
>> it seems you have forgotten the attachment.
>
> I can see one in the message I sent. Attaching again, try this.
>
>
>> But currently the Hama Streaming API [2] does not support partitioning.
>
> So, the text protocol does not support it, or does it lack only in the
> Python wrapper?
>
> So, the default partitioning is arbitrary, regardless of who is
> reading and who is not? Then it seems the easiest way to work it
> around is to have the master thread resend those records to slaves...
> if they are not very big.
>
> Thanks,
> Roman
>
> On Tue, Oct 1, 2013 at 9:30 AM, Martin Illecker <millecker@apache.org> wrote:
>> Hi Roman,
>>
>> it seems you have forgotten the attachment. (your code)
>>
>> ad 1)
>> I would solve this by using a custom partitioner.
>> A custom partitioner defines which records are distributed to which tasks.
>>
>> Here is some C++ partitioner example [1].
>> e.g., key 3,6,9 partitioner should return 1
>> and  key 2,5,8 should return 2
>>
>> But currently the Hama Streaming API [2] does not support partitioning.
>> Only Hama Pipes C++ supports it.
>>
>> ad 2)
>> Please submit your code, I will have a look at this exception.
>> Or please submit the tasklog.
>>
>> Martin
>>
>> [1]
>> https://github.com/apache/hama/blob/trunk/c%2B%2B/src/main/native/examples/impl/matrixmultiplication.cc#L131-138
>> [2]
>> https://github.com/millecker/HamaStreaming/blob/1009bb1a6472d11f5dd3af9dc07fe64547dd0290/BinaryProtocol.py#L37-38
>>
>> 2013/9/30 Roman Shapovalov <shapovalov@graphics.cs.msu.su>
>>
>>> Hello all,
>>>
>>> I am developing a toy master-slave application for the Python
>>> streaming interface. There are two issues.
>>>
>>> 1. What is the semantics of the readNext command?
>>>
>>> If I run 3 tasks -- one of them is master who does not read input, --
>>> slaves take turn to read records, but each of them reads only each
>>> third example, e.g. slave#1 reads records 3,6,9, while slave#2 reads
>>> 2,5,8. So 1/3 of records are skipped, as if the master task would read
>>> them.
>>>
>>> So, what is the exact semantics? Is there any best practice to make
>>> each example read by some task (but not the master).
>>>
>>>
>>> 2. After the code is executed (and the output is written), the job
>>> fails. All the task logs contain the following text:
>>>
>>> 13/09/30 16:32:09 ERROR protocol.UplinkReader:
>>> java.lang.NullPointerException
>>>     at
>>> org.apache.hama.pipes.protocol.UplinkReader.run(UplinkReader.java:127)
>>>
>>> The exception is raised even if I don't use pipes at all. Since it
>>> shows up after cleanup, it is not critical for the program, but it may
>>> indicate some misuse by me or bugs in the Hama code.
>>>
>>> Please look at that issue. My code is attached.
>>>
>>> Roman
>>>

Mime
View raw message