hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Jungblut <thomas.jungb...@gmail.com>
Subject Re: runtimePartitioning in GraphJobRunner
Date Mon, 10 Dec 2012 12:51:32 GMT
That's nothing personal, just about how we solve the problems we face.
We need just some trade-off between API compatibility and scalability
improvement.

2012/12/10 Edward J. Yoon <edwardyoon@apache.org>

> I don't dislike your Intuitive input reader. Once cleaning is done, we
> can think about it again.
>
> On Mon, Dec 10, 2012 at 9:37 PM, Thomas Jungblut
> <thomas.jungblut@gmail.com> wrote:
> > no problem, forgot what I've done there anyways.
> >
> > 2012/12/10 Edward J. Yoon <edwardyoon@apache.org>
> >
> >> > Just wanted to remind you why we introduced runtime partitioning.
> >>
> >> Sorry that I could not review your patch of HAMA-531 and many things
> >> of Hama 0.5 release. I was busy.
> >>
> >> On Mon, Dec 10, 2012 at 8:47 PM, Thomas Jungblut
> >> <thomas.jungblut@gmail.com> wrote:
> >> > Just wanted to remind you why we introduced runtime partitioning.
> >> >
> >> > 2012/12/10 Edward J. Yoon <edwardyoon@apache.org>
> >> >
> >> >> HDFS is common. It's not tunable for only Hama BSP computing.
> >> >>
> >> >> > Yes, so spilling on disk is the easiest solution to save memory.
> Not
> >> >> > changing the partitioning.
> >> >> > If you want to split again through the block boundaries to
> distribute
> >> the
> >> >> > data through the cluster, then do it, but this is plainly wrong.
> >> >>
> >> >> Vertex load balancing is basically uses Hash partitioner. You can't
> >> >> avoid data transfers.
> >> >>
> >> >> Again...,
> >> >>
> >> >> VertexInputReader and runtime partitioning make code complex as I
> >> >> mentioned above.
> >> >>
> >> >> > This reader is needed, so people can create vertices from their
own
> >> >> fileformat.
> >> >>
> >> >> I don't think so. Instead of VertexInputReader, we can provide <K
> >> >> extends WritableComparable, V extends ArrayWritable>.
> >> >>
> >> >> Let's assume that there's a web table in Google's BigTable (HBase).
> >> >> User can create their own WebTableInputFormatter to read records as
a
> >> >> <Text url, TextArrayWritable anchors>. Am I wrong?
> >> >>
> >> >> On Mon, Dec 10, 2012 at 8:21 PM, Thomas Jungblut
> >> >> <thomas.jungblut@gmail.com> wrote:
> >> >> > Yes, because changing the blocksize to 32m will just use 300mb
of
> >> memory,
> >> >> > so you can add more machines to fit the number of resulting tasks.
> >> >> >
> >> >> > If each node have small memory, there's no way to process in memory
> >> >> >
> >> >> >
> >> >> > Yes, so spilling on disk is the easiest solution to save memory.
> Not
> >> >> > changing the partitioning.
> >> >> > If you want to split again through the block boundaries to
> distribute
> >> the
> >> >> > data through the cluster, then do it, but this is plainly wrong.
> >> >> >
> >> >> > 2012/12/10 Edward J. Yoon <edwardyoon@apache.org>
> >> >> >
> >> >> >> > A Hama cluster is scalable. It means that the computing
capacity
> >> >> >> >> should be increased by adding slaves. Right?
> >> >> >> >
> >> >> >> >
> >> >> >> > I'm sorry, but I don't see how this relates to the vertex
input
> >> >> reader.
> >> >> >>
> >> >> >> Not related with input reader. It related with partitioning
and
> load
> >> >> >> balancing. As I reported to you before, to process vertices
within
> >> >> >> 256MB block, each TaskRunner requied 25~30GB memory.
> >> >> >>
> >> >> >> If each node have small memory, there's no way to process
in
> memory
> >> >> >> without changing block size of HDFS.
> >> >> >>
> >> >> >> Do you think this is scalable?
> >> >> >>
> >> >> >> On Mon, Dec 10, 2012 at 7:59 PM, Thomas Jungblut
> >> >> >> <thomas.jungblut@gmail.com> wrote:
> >> >> >> > Oh okay, so if you want to remove that, have a lot of
fun. This
> >> >> reader is
> >> >> >> > needed, so people can create vertices from their own
fileformat.
> >> >> >> > Going back to a sequencefile input will not only break
backward
> >> >> >> > compatibility but also make the same issues we had before.
> >> >> >> >
> >> >> >> > A Hama cluster is scalable. It means that the computing
capacity
> >> >> >> >> should be increased by adding slaves. Right?
> >> >> >> >
> >> >> >> >
> >> >> >> > I'm sorry, but I don't see how this relates to the vertex
input
> >> >> reader.
> >> >> >> >
> >> >> >> > 2012/12/10 Edward J. Yoon <edwardyoon@apache.org>
> >> >> >> >
> >> >> >> >> A Hama cluster is scalable. It means that the computing
> capacity
> >> >> >> >> should be increased by adding slaves. Right?
> >> >> >> >>
> >> >> >> >> As I mentioned before, disk-queue and storing vertices
on local
> >> disk
> >> >> >> >> are not urgent.
> >> >> >> >>
> >> >> >> >> In short, yeah, I wan to remove VertexInputReader
and runtime
> >> >> >> >> partition in Graph package.
> >> >> >> >>
> >> >> >> >> See also,
> >> >> >> >>
> >> >> >>
> >> >>
> >>
> https://issues.apache.org/jira/browse/HAMA-531?focusedCommentId=13527756&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13527756
> >> >> >> >>
> >> >> >> >> On Mon, Dec 10, 2012 at 7:31 PM, Thomas Jungblut
> >> >> >> >> <thomas.jungblut@gmail.com> wrote:
> >> >> >> >> > uhm, I have no idea what you want to archieve,
do you want to
> >> get
> >> >> >> back to
> >> >> >> >> > client-side partitioning?
> >> >> >> >> >
> >> >> >> >> > 2012/12/10 Edward J. Yoon <edwardyoon@apache.org>
> >> >> >> >> >
> >> >> >> >> >> If there's no opinion, I'll remove VertexInputReader
in
> >> >> >> >> >> GraphJobRunner, because it make code complex.
Let's consider
> >> again
> >> >> >> >> >> about the VertexInputReader, after fixing
HAMA-531 and
> HAMA-632
> >> >> >> >> >> issues.
> >> >> >> >> >>
> >> >> >> >> >> On Fri, Dec 7, 2012 at 9:35 AM, Edward J.
Yoon <
> >> >> >> edwardyoon@apache.org>
> >> >> >> >> >> wrote:
> >> >> >> >> >> > Or, I'd like to get rid of VertexInputReader.
> >> >> >> >> >> >
> >> >> >> >> >> > On Fri, Dec 7, 2012 at 9:30 AM, Edward
J. Yoon <
> >> >> >> edwardyoon@apache.org
> >> >> >> >> >
> >> >> >> >> >> wrote:
> >> >> >> >> >> >> In fact, there's no choice but
to use runtimePartitioning
> >> >> >> (because of
> >> >> >> >> >> >> VertexInputReader). Right? If so,
I would like to delete
> all
> >> >> "if
> >> >> >> >> >> >> (runtimePartitioning) {" conditions.
> >> >> >> >> >> >>
> >> >> >> >> >> >> --
> >> >> >> >> >> >> Best Regards, Edward J. Yoon
> >> >> >> >> >> >> @eddieyoon
> >> >> >> >> >> >
> >> >> >> >> >> >
> >> >> >> >> >> >
> >> >> >> >> >> > --
> >> >> >> >> >> > Best Regards, Edward J. Yoon
> >> >> >> >> >> > @eddieyoon
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >> --
> >> >> >> >> >> Best Regards, Edward J. Yoon
> >> >> >> >> >> @eddieyoon
> >> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> --
> >> >> >> >> Best Regards, Edward J. Yoon
> >> >> >> >> @eddieyoon
> >> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> Best Regards, Edward J. Yoon
> >> >> >> @eddieyoon
> >> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Best Regards, Edward J. Yoon
> >> >> @eddieyoon
> >> >>
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> @eddieyoon
> >>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message