hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suraj Menon <menonsur...@gmail.com>
Subject Re: [DISCUSS] (HAMA-521) Improve message buffering to save memory
Date Mon, 16 Apr 2012 17:40:05 GMT
Just to make things clear.
The purpose here was to spill the records to disk in case the messages were
going out of main memory bounds.
HAMA-521 is a buffer implementation that helps holding messages when they
grow more than main memory limits.
This buffer is the source for sending messages to peers and as well as
getting messages for check-pointing(writing to HDFS) them.

On Mon, Apr 16, 2012 at 1:32 AM, Thomas Jungblut <
thomas.jungblut@googlemail.com> wrote:

> >
> > then in the sync phase reading all of them back into memory and sending
> > them over RPC.
>
>
> Sure but we are not reading all of them back at one time, but step by step.
>
> Can we do this instead?
>
>
> I proposed this filesystem transfer over a year ago, today you can just
> implement a new message service which does that. -> open a jira if you
> like.
>
> Today we could save some operations if we can use the data written to HDFS
> for checkpointing and buffering as well as the RPC transfer. But
> distributed reads are expensive and you must go via the namenode to read a
> block. This is I guess why it is too heavy in distributed environments.
>
> Am 15. April 2012 22:59 schrieb Apurv Verma <dapurv5@gmail.com>:
>
> > Hello,
> >  I looked at the patch which Thomas submitted and have a few questions.
> >
> > What we are currently doing?
> > To save main memory space, an implementation of a disk based queue has
> been
> > provided which writes all the messages to the disk.
> >
> > +  protected final HashMap<String, InetSocketAddress> peerSocketCache =
> new
> > HashMap<String, InetSocketAddress>();
> > +  protected final HashMap<InetSocketAddress, MessageQueue<M>>
> > outgoingQueues = new HashMap<InetSocketAddress, MessageQueue<M>>();
> > +  protected MessageQueue<M> localQueue;
> >
> > Now what we are doing here is that writing all the messages to the disk,
> > then in the sync phase reading all of them back into memory and sending
> > them over RPC.
> >
> > Can we do this instead?
> > We write all the messages to HDFS, as we are doing right now and then
> store
> > the PATH where we wrote all the messages, in the sync() phase instead of
> > rereading the messages from the disk and sending them to the appropriate
> > host over RPC, we just send the the concerned peer the PATH where its
> > messages are stored, the concerned peer can read them inside the sync
> > phase. After all this is what I understand of a DFS (Distributed File
> > System).
> >
> > But now the whole concept of message passing in *BSP* is bypassed?I am
> sure
> > this must would have been thought.
> > Am I thinking something radically wrong.  Please correct me why this
> won't
> > a nicer way for BSP to pass messages.
> >
> > --
> > thanks and regards,
> >
> > Apurv Verma
> >
>
>
>
> --
> Thomas Jungblut
> Berlin <thomas.jungblut@gmail.com>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message