hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apurv Verma <dapu...@gmail.com>
Subject Re: [jira] [Commented] (HAMA-559) Add a caching message queue
Date Thu, 20 Sep 2012 09:59:04 GMT
Yeah ehcache would be random access. But the good thing about it is that it
can make the disk spilling process transparent and seamless, you just
specify the size you want to keep in memory and rest goes to disk. With
Apache DirectMemory coming up behind ehcache these messages can even be
flushed to offheap first and then to disk. Which would be really fast but
that's when it comes.

For now if we are not using EhCache I think then our current queue
implementation needs to cleaned up slightly. Then I have another idea of
building a SegmentedQueue on top of the normal DiskQueue to fasten it up.
In  a segment queue instead of storing messages in just one queue you store
it in multiple segments, then at the time of reading. You spawn multiple
threads to read from the segment queue and do the message send phase.
This should give us significant parallelization benefits, I will try to do
the AsyncSend and SegmentQueue, this and the next weekend. Its  pretty
active in my head ;)

Apurv Verma

On Thu, Sep 20, 2012 at 1:59 PM, Thomas Jungblut (JIRA) <jira@apache.org>wrote:

>     [
> https://issues.apache.org/jira/browse/HAMA-559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459448#comment-13459448]
> Thomas Jungblut commented on HAMA-559:
> --------------------------------------
> The question is what we really need. A queue normally just get read
> sequentially in our case, so it would indeed by advantageous to only spill
> to disk if it exceeds a certain limit (64m chunk or something other
> configured).
> Ehcache is about caching something from a slow device, which does not give
> us benefits, because messages from usercode reside in memory first.
> This random access vs. sequential access, I believe we only have the
> latter. Feel free to correct me with the usecases of ehcache, you're
> working on it ;)
> > Add a caching message queue
> > ---------------------------
> >
> >                 Key: HAMA-559
> >                 URL: https://issues.apache.org/jira/browse/HAMA-559
> >             Project: Hama
> >          Issue Type: New Feature
> >          Components: bsp core
> >    Affects Versions: 0.5.0
> >            Reporter: Thomas Jungblut
> >            Priority: Minor
> >             Fix For: 0.6.0
> >
> >
> > After HAMA-521 is done, we can add a caching queue which just holds the
> messages in RAM that fit into the heap space. The rest can be flushed to
> disk.
> > We may call this a HybridQueue or something like that.
> > The benefits should be that we don't have to flush to disk so often and
> get faster. However we may have more GC so it is always overall faster.
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators
> For more information on JIRA, see: http://www.atlassian.com/software/jira

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message