hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Jungblut <thomas.jungb...@googlemail.com>
Subject Re: [DISCUSS] (HAMA-521) Improve message buffering to save memory
Date Mon, 16 Apr 2012 05:32:54 GMT
> then in the sync phase reading all of them back into memory and sending
> them over RPC.

Sure but we are not reading all of them back at one time, but step by step.

Can we do this instead?

I proposed this filesystem transfer over a year ago, today you can just
implement a new message service which does that. -> open a jira if you like.

Today we could save some operations if we can use the data written to HDFS
for checkpointing and buffering as well as the RPC transfer. But
distributed reads are expensive and you must go via the namenode to read a
block. This is I guess why it is too heavy in distributed environments.

Am 15. April 2012 22:59 schrieb Apurv Verma <dapurv5@gmail.com>:

> Hello,
>  I looked at the patch which Thomas submitted and have a few questions.
> What we are currently doing?
> To save main memory space, an implementation of a disk based queue has been
> provided which writes all the messages to the disk.
> +  protected final HashMap<String, InetSocketAddress> peerSocketCache = new
> HashMap<String, InetSocketAddress>();
> +  protected final HashMap<InetSocketAddress, MessageQueue<M>>
> outgoingQueues = new HashMap<InetSocketAddress, MessageQueue<M>>();
> +  protected MessageQueue<M> localQueue;
> Now what we are doing here is that writing all the messages to the disk,
> then in the sync phase reading all of them back into memory and sending
> them over RPC.
> Can we do this instead?
> We write all the messages to HDFS, as we are doing right now and then store
> the PATH where we wrote all the messages, in the sync() phase instead of
> rereading the messages from the disk and sending them to the appropriate
> host over RPC, we just send the the concerned peer the PATH where its
> messages are stored, the concerned peer can read them inside the sync
> phase. After all this is what I understand of a DFS (Distributed File
> System).
> But now the whole concept of message passing in *BSP* is bypassed?I am sure
> this must would have been thought.
> Am I thinking something radically wrong.  Please correct me why this won't
> a nicer way for BSP to pass messages.
> --
> thanks and regards,
> Apurv Verma

Thomas Jungblut
Berlin <thomas.jungblut@gmail.com>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message