hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: runtimePartitioning in GraphJobRunner
Date Mon, 10 Dec 2012 13:11:01 GMT
Anyway, then I'm removing them tomorrow.

On Mon, Dec 10, 2012 at 10:09 PM, Edward J. Yoon <edwardyoon@apache.org> wrote:
> You know what? If graph is not stored well in somewhere, graph should
> be extracted from unstructured data. parseVertex API is only good for
> simple test/debug programs, because it's human readable text.
>
> In my case, generating test data is very annoying.
>
> On Mon, Dec 10, 2012 at 9:51 PM, Thomas Jungblut
> <thomas.jungblut@gmail.com> wrote:
>> That's nothing personal, just about how we solve the problems we face.
>> We need just some trade-off between API compatibility and scalability
>> improvement.
>>
>> 2012/12/10 Edward J. Yoon <edwardyoon@apache.org>
>>
>>> I don't dislike your Intuitive input reader. Once cleaning is done, we
>>> can think about it again.
>>>
>>> On Mon, Dec 10, 2012 at 9:37 PM, Thomas Jungblut
>>> <thomas.jungblut@gmail.com> wrote:
>>> > no problem, forgot what I've done there anyways.
>>> >
>>> > 2012/12/10 Edward J. Yoon <edwardyoon@apache.org>
>>> >
>>> >> > Just wanted to remind you why we introduced runtime partitioning.
>>> >>
>>> >> Sorry that I could not review your patch of HAMA-531 and many things
>>> >> of Hama 0.5 release. I was busy.
>>> >>
>>> >> On Mon, Dec 10, 2012 at 8:47 PM, Thomas Jungblut
>>> >> <thomas.jungblut@gmail.com> wrote:
>>> >> > Just wanted to remind you why we introduced runtime partitioning.
>>> >> >
>>> >> > 2012/12/10 Edward J. Yoon <edwardyoon@apache.org>
>>> >> >
>>> >> >> HDFS is common. It's not tunable for only Hama BSP computing.
>>> >> >>
>>> >> >> > Yes, so spilling on disk is the easiest solution to save
memory.
>>> Not
>>> >> >> > changing the partitioning.
>>> >> >> > If you want to split again through the block boundaries
to
>>> distribute
>>> >> the
>>> >> >> > data through the cluster, then do it, but this is plainly
wrong.
>>> >> >>
>>> >> >> Vertex load balancing is basically uses Hash partitioner. You
can't
>>> >> >> avoid data transfers.
>>> >> >>
>>> >> >> Again...,
>>> >> >>
>>> >> >> VertexInputReader and runtime partitioning make code complex
as I
>>> >> >> mentioned above.
>>> >> >>
>>> >> >> > This reader is needed, so people can create vertices from
their own
>>> >> >> fileformat.
>>> >> >>
>>> >> >> I don't think so. Instead of VertexInputReader, we can provide
<K
>>> >> >> extends WritableComparable, V extends ArrayWritable>.
>>> >> >>
>>> >> >> Let's assume that there's a web table in Google's BigTable
(HBase).
>>> >> >> User can create their own WebTableInputFormatter to read records
as a
>>> >> >> <Text url, TextArrayWritable anchors>. Am I wrong?
>>> >> >>
>>> >> >> On Mon, Dec 10, 2012 at 8:21 PM, Thomas Jungblut
>>> >> >> <thomas.jungblut@gmail.com> wrote:
>>> >> >> > Yes, because changing the blocksize to 32m will just use
300mb of
>>> >> memory,
>>> >> >> > so you can add more machines to fit the number of resulting
tasks.
>>> >> >> >
>>> >> >> > If each node have small memory, there's no way to process
in memory
>>> >> >> >
>>> >> >> >
>>> >> >> > Yes, so spilling on disk is the easiest solution to save
memory.
>>> Not
>>> >> >> > changing the partitioning.
>>> >> >> > If you want to split again through the block boundaries
to
>>> distribute
>>> >> the
>>> >> >> > data through the cluster, then do it, but this is plainly
wrong.
>>> >> >> >
>>> >> >> > 2012/12/10 Edward J. Yoon <edwardyoon@apache.org>
>>> >> >> >
>>> >> >> >> > A Hama cluster is scalable. It means that the
computing capacity
>>> >> >> >> >> should be increased by adding slaves. Right?
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> > I'm sorry, but I don't see how this relates to
the vertex input
>>> >> >> reader.
>>> >> >> >>
>>> >> >> >> Not related with input reader. It related with partitioning
and
>>> load
>>> >> >> >> balancing. As I reported to you before, to process
vertices within
>>> >> >> >> 256MB block, each TaskRunner requied 25~30GB memory.
>>> >> >> >>
>>> >> >> >> If each node have small memory, there's no way to
process in
>>> memory
>>> >> >> >> without changing block size of HDFS.
>>> >> >> >>
>>> >> >> >> Do you think this is scalable?
>>> >> >> >>
>>> >> >> >> On Mon, Dec 10, 2012 at 7:59 PM, Thomas Jungblut
>>> >> >> >> <thomas.jungblut@gmail.com> wrote:
>>> >> >> >> > Oh okay, so if you want to remove that, have
a lot of fun. This
>>> >> >> reader is
>>> >> >> >> > needed, so people can create vertices from their
own fileformat.
>>> >> >> >> > Going back to a sequencefile input will not only
break backward
>>> >> >> >> > compatibility but also make the same issues we
had before.
>>> >> >> >> >
>>> >> >> >> > A Hama cluster is scalable. It means that the
computing capacity
>>> >> >> >> >> should be increased by adding slaves. Right?
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> > I'm sorry, but I don't see how this relates to
the vertex input
>>> >> >> reader.
>>> >> >> >> >
>>> >> >> >> > 2012/12/10 Edward J. Yoon <edwardyoon@apache.org>
>>> >> >> >> >
>>> >> >> >> >> A Hama cluster is scalable. It means that
the computing
>>> capacity
>>> >> >> >> >> should be increased by adding slaves. Right?
>>> >> >> >> >>
>>> >> >> >> >> As I mentioned before, disk-queue and storing
vertices on local
>>> >> disk
>>> >> >> >> >> are not urgent.
>>> >> >> >> >>
>>> >> >> >> >> In short, yeah, I wan to remove VertexInputReader
and runtime
>>> >> >> >> >> partition in Graph package.
>>> >> >> >> >>
>>> >> >> >> >> See also,
>>> >> >> >> >>
>>> >> >> >>
>>> >> >>
>>> >>
>>> https://issues.apache.org/jira/browse/HAMA-531?focusedCommentId=13527756&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13527756
>>> >> >> >> >>
>>> >> >> >> >> On Mon, Dec 10, 2012 at 7:31 PM, Thomas Jungblut
>>> >> >> >> >> <thomas.jungblut@gmail.com> wrote:
>>> >> >> >> >> > uhm, I have no idea what you want to
archieve, do you want to
>>> >> get
>>> >> >> >> back to
>>> >> >> >> >> > client-side partitioning?
>>> >> >> >> >> >
>>> >> >> >> >> > 2012/12/10 Edward J. Yoon <edwardyoon@apache.org>
>>> >> >> >> >> >
>>> >> >> >> >> >> If there's no opinion, I'll remove
VertexInputReader in
>>> >> >> >> >> >> GraphJobRunner, because it make
code complex. Let's consider
>>> >> again
>>> >> >> >> >> >> about the VertexInputReader, after
fixing HAMA-531 and
>>> HAMA-632
>>> >> >> >> >> >> issues.
>>> >> >> >> >> >>
>>> >> >> >> >> >> On Fri, Dec 7, 2012 at 9:35 AM,
Edward J. Yoon <
>>> >> >> >> edwardyoon@apache.org>
>>> >> >> >> >> >> wrote:
>>> >> >> >> >> >> > Or, I'd like to get rid of
VertexInputReader.
>>> >> >> >> >> >> >
>>> >> >> >> >> >> > On Fri, Dec 7, 2012 at 9:30
AM, Edward J. Yoon <
>>> >> >> >> edwardyoon@apache.org
>>> >> >> >> >> >
>>> >> >> >> >> >> wrote:
>>> >> >> >> >> >> >> In fact, there's no choice
but to use runtimePartitioning
>>> >> >> >> (because of
>>> >> >> >> >> >> >> VertexInputReader). Right?
If so, I would like to delete
>>> all
>>> >> >> "if
>>> >> >> >> >> >> >> (runtimePartitioning) {"
conditions.
>>> >> >> >> >> >> >>
>>> >> >> >> >> >> >> --
>>> >> >> >> >> >> >> Best Regards, Edward J.
Yoon
>>> >> >> >> >> >> >> @eddieyoon
>>> >> >> >> >> >> >
>>> >> >> >> >> >> >
>>> >> >> >> >> >> >
>>> >> >> >> >> >> > --
>>> >> >> >> >> >> > Best Regards, Edward J. Yoon
>>> >> >> >> >> >> > @eddieyoon
>>> >> >> >> >> >>
>>> >> >> >> >> >>
>>> >> >> >> >> >>
>>> >> >> >> >> >> --
>>> >> >> >> >> >> Best Regards, Edward J. Yoon
>>> >> >> >> >> >> @eddieyoon
>>> >> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >> --
>>> >> >> >> >> Best Regards, Edward J. Yoon
>>> >> >> >> >> @eddieyoon
>>> >> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> --
>>> >> >> >> Best Regards, Edward J. Yoon
>>> >> >> >> @eddieyoon
>>> >> >> >>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >> Best Regards, Edward J. Yoon
>>> >> >> @eddieyoon
>>> >> >>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Best Regards, Edward J. Yoon
>>> >> @eddieyoon
>>> >>
>>>
>>>
>>>
>>> --
>>> Best Regards, Edward J. Yoon
>>> @eddieyoon
>>>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Mime
View raw message