hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Jungblut <thomas.jungb...@gmail.com>
Subject Re: runtimePartitioning in GraphJobRunner
Date Mon, 10 Dec 2012 11:21:55 GMT
Yes, because changing the blocksize to 32m will just use 300mb of memory,
so you can add more machines to fit the number of resulting tasks.

If each node have small memory, there's no way to process in memory


Yes, so spilling on disk is the easiest solution to save memory. Not
changing the partitioning.
If you want to split again through the block boundaries to distribute the
data through the cluster, then do it, but this is plainly wrong.

2012/12/10 Edward J. Yoon <edwardyoon@apache.org>

> > A Hama cluster is scalable. It means that the computing capacity
> >> should be increased by adding slaves. Right?
> >
> >
> > I'm sorry, but I don't see how this relates to the vertex input reader.
>
> Not related with input reader. It related with partitioning and load
> balancing. As I reported to you before, to process vertices within
> 256MB block, each TaskRunner requied 25~30GB memory.
>
> If each node have small memory, there's no way to process in memory
> without changing block size of HDFS.
>
> Do you think this is scalable?
>
> On Mon, Dec 10, 2012 at 7:59 PM, Thomas Jungblut
> <thomas.jungblut@gmail.com> wrote:
> > Oh okay, so if you want to remove that, have a lot of fun. This reader is
> > needed, so people can create vertices from their own fileformat.
> > Going back to a sequencefile input will not only break backward
> > compatibility but also make the same issues we had before.
> >
> > A Hama cluster is scalable. It means that the computing capacity
> >> should be increased by adding slaves. Right?
> >
> >
> > I'm sorry, but I don't see how this relates to the vertex input reader.
> >
> > 2012/12/10 Edward J. Yoon <edwardyoon@apache.org>
> >
> >> A Hama cluster is scalable. It means that the computing capacity
> >> should be increased by adding slaves. Right?
> >>
> >> As I mentioned before, disk-queue and storing vertices on local disk
> >> are not urgent.
> >>
> >> In short, yeah, I wan to remove VertexInputReader and runtime
> >> partition in Graph package.
> >>
> >> See also,
> >>
> https://issues.apache.org/jira/browse/HAMA-531?focusedCommentId=13527756&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13527756
> >>
> >> On Mon, Dec 10, 2012 at 7:31 PM, Thomas Jungblut
> >> <thomas.jungblut@gmail.com> wrote:
> >> > uhm, I have no idea what you want to archieve, do you want to get
> back to
> >> > client-side partitioning?
> >> >
> >> > 2012/12/10 Edward J. Yoon <edwardyoon@apache.org>
> >> >
> >> >> If there's no opinion, I'll remove VertexInputReader in
> >> >> GraphJobRunner, because it make code complex. Let's consider again
> >> >> about the VertexInputReader, after fixing HAMA-531 and HAMA-632
> >> >> issues.
> >> >>
> >> >> On Fri, Dec 7, 2012 at 9:35 AM, Edward J. Yoon <
> edwardyoon@apache.org>
> >> >> wrote:
> >> >> > Or, I'd like to get rid of VertexInputReader.
> >> >> >
> >> >> > On Fri, Dec 7, 2012 at 9:30 AM, Edward J. Yoon <
> edwardyoon@apache.org
> >> >
> >> >> wrote:
> >> >> >> In fact, there's no choice but to use runtimePartitioning
> (because of
> >> >> >> VertexInputReader). Right? If so, I would like to delete all
"if
> >> >> >> (runtimePartitioning) {" conditions.
> >> >> >>
> >> >> >> --
> >> >> >> Best Regards, Edward J. Yoon
> >> >> >> @eddieyoon
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Best Regards, Edward J. Yoon
> >> >> > @eddieyoon
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Best Regards, Edward J. Yoon
> >> >> @eddieyoon
> >> >>
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> @eddieyoon
> >>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message