Return-Path: X-Original-To: apmail-hama-dev-archive@www.apache.org Delivered-To: apmail-hama-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 02805DBBC for ; Fri, 28 Sep 2012 22:38:50 +0000 (UTC) Received: (qmail 14803 invoked by uid 500); 28 Sep 2012 22:38:49 -0000 Delivered-To: apmail-hama-dev-archive@hama.apache.org Received: (qmail 14783 invoked by uid 500); 28 Sep 2012 22:38:49 -0000 Mailing-List: contact dev-help@hama.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hama.apache.org Delivered-To: mailing list dev@hama.apache.org Received: (qmail 14775 invoked by uid 99); 28 Sep 2012 22:38:49 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Sep 2012 22:38:49 +0000 Received: from localhost (HELO [192.168.123.123]) (127.0.0.1) (smtp-auth username edwardyoon, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Sep 2012 22:38:49 +0000 Subject: Re: [jira] [Commented] (HAMA-642) Make GraphRunner disk based References: <2055951671.140757.1348862828441.JavaMail.jiratomcat@arcas> From: "Edward J. Yoon" Content-Type: text/plain; charset=us-ascii X-Mailer: iPad Mail (9B206) In-Reply-To: <2055951671.140757.1348862828441.JavaMail.jiratomcat@arcas> Message-Id: <7ED13562-4023-4AC1-9F10-463016314D66@apache.org> Date: Sat, 29 Sep 2012 07:38:51 +0900 To: "dev@hama.apache.org" Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (1.0) > - Does this fail always or just sometimes? Always > - When it finishes, is the result wrong? Just curios, how do you compare 2= 0gb of text files?;D Never finishes. > - In case it is really the combiner, does pagerank work without problems? Never finishes if input is large. Sent from my iPad On Sep 29, 2012, at 5:07 AM, "Thomas Jungblut (JIRA)" wrot= e: >=20 > [ https://issues.apache.org/jira/browse/HAMA-642?page=3Dcom.atlassian.j= ira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1346586= 6#comment-13465866 ]=20 >=20 > Thomas Jungblut commented on HAMA-642: > -------------------------------------- >=20 > A race is not good. We have to investigate a bit deeper I guess. I don't t= hink that there is a concurrency problem inside of jdbm, but I will have a l= ook, maybe there is some resources that is static, however each task has its= own mutal exclusive "database". so I don't see a problem there.=20 >=20 > My first guess was the use of the combiner. So here my questions: > - Does this fail always or just sometimes? > - When it finishes, is the result wrong? Just curios, how do you compare 2= 0gb of text files?;D > - In case it is really the combiner, does pagerank work without problems? >=20 > I will build a smaller cluster in near future to test these things more ef= ficiently. >=20 >> Make GraphRunner disk based >> --------------------------- >>=20 >> Key: HAMA-642 >> URL: https://issues.apache.org/jira/browse/HAMA-642 >> Project: Hama >> Issue Type: Improvement >> Components: graph >> Affects Versions: 0.5.0 >> Reporter: Thomas Jungblut >> Assignee: Edward J. Yoon >> Attachments: HAMA-642_unix_1.patch, HAMA-642_unix_2.patch, HAMA-sc= ale_1.patch, HAMA-scale_2.patch, HAMA-scale_3.patch, HAMA-scale_4.patch >>=20 >>=20 >> To improve scalability we can improve the graph runner to be disk based. >> Which basically means: >> - We have just a single Vertex instance that get's refilled. >> - We directly write vertices to disk after partitioning >> - In every superstep we iterate over the vertices on disk, fill the verte= x instance and call the users compute functions >> Problems: >> - State other than vertex value can't be stored easy >> - How do we deal with random access after messages have arrived? >> So I think we should make the graph runner more hybrid, like using the qu= eues we have implemented in the messaging. So the graphrunner can be configu= red to run completely on disk, in cached mode or in in-memory mode. >=20 > -- > This message is automatically generated by JIRA. > If you think it was sent incorrectly, please contact your JIRA administrat= ors > For more information on JIRA, see: http://www.atlassian.com/software/jira