Return-Path: X-Original-To: apmail-giraph-dev-archive@www.apache.org Delivered-To: apmail-giraph-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A226D10DBA for ; Sat, 15 Feb 2014 19:43:03 +0000 (UTC) Received: (qmail 46901 invoked by uid 500); 15 Feb 2014 19:43:02 -0000 Delivered-To: apmail-giraph-dev-archive@giraph.apache.org Received: (qmail 46807 invoked by uid 500); 15 Feb 2014 19:43:02 -0000 Mailing-List: contact dev-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@giraph.apache.org Delivered-To: mailing list dev@giraph.apache.org Received: (qmail 46798 invoked by uid 99); 15 Feb 2014 19:43:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 15 Feb 2014 19:43:02 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ssc.open@googlemail.com designates 209.85.215.170 as permitted sender) Received: from [209.85.215.170] (HELO mail-ea0-f170.google.com) (209.85.215.170) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 15 Feb 2014 19:42:58 +0000 Received: by mail-ea0-f170.google.com with SMTP id g15so4651486eak.29 for ; Sat, 15 Feb 2014 11:42:36 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=message-id:date:from:reply-to:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=IZ22becJN5BKpuXBWNSuipHFO3ZB3R9A8I00PPXadmQ=; b=aJS71mY9Ol7Li3I9yaf54DA/uefCFZmvYBKyovhhPtWIU30iwqkca5OX2rrawHsj0l X/U+3Qkh6IVLFSvDBPzST9nlsRNTyrJolE6tVm8bMCV6p0UT6jjvd22X1Z87zIyIr0IX JAnY7/taod00g1ijR2DIHetF9azYUGCT163U4cBjadEouDpOEx+bdSulWt8L6Tt1s1uX eymAR8zc+VEgDqa/7xJGReGx/w4E/Fh+rK8z2TyhZyHHpeohtb8IlK3zegu4VmJWZvhW CiDgs9M1q9+gyfeK9KLjk4ohe8cDrtWVCUQqzd6YT5YiZuQ4DegpTmMR5idOlWt9EsWa GOqw== X-Received: by 10.14.183.132 with SMTP id q4mr999543eem.91.1392493356405; Sat, 15 Feb 2014 11:42:36 -0800 (PST) Received: from [192.168.0.2] (g225156199.adsl.alicedsl.de. [92.225.156.199]) by mx.google.com with ESMTPSA id 46sm35762250ees.4.2014.02.15.11.42.34 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sat, 15 Feb 2014 11:42:35 -0800 (PST) Message-ID: <52FFC329.7070704@apache.org> Date: Sat, 15 Feb 2014 20:42:33 +0100 From: Sebastian Schelter Reply-To: ssc@apache.org User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Claudio Martella , "dev@giraph.apache.org" CC: Armando Miraglia Subject: Re: GIRAPH-825 and GIRAPH-840 References: <52FAB4BE.2000005@apache.org> <52FB38C4.5050002@apache.org> <20140212115025.GA692@imap.vu.nl> <20140212115318.GB692@imap.vu.nl> <52FB6266.9030202@apache.org> <52FB7E38.80102@apache.org> <20140212152125.GA686@imap.vu.nl> <52FD3026.3060403@apache.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org I copied the caching from o.a.g.io.formats.IntIntNullTextInputFormat and it worked well during my tests (it did not happen that all vertices had the same id). I'm happy to remove this and rerun the tests. It's strange that out-of-core works with PageRank on a generated graph, but not with Hyperball on the twitter graph. The generated graph has a uniform degree distribution, while the twitter graph's degree distribution is heavily skewed, can that have an influence on the behavior of ooc? Best, Sebastian On 02/15/2014 08:32 PM, Claudio Martella wrote: > Sebastian, I had a look at your vertexinputformat. I think there might be a > bug. Why are you caching/reusing the id? This way every vertex parsed by > the vertexreader will share the same ID object, and hence have the same ID. > I think this is broken. you should instantiate a new ID object in the > preprocessLine. > Can you try like that? > > > On Thu, Feb 13, 2014 at 9:50 PM, Sebastian Schelter wrote: > >> Hi Armando, >> >> I uploaded my test code to github at: >> >> https://github.com/sscdotopen/giraph/tree/hyperball64-ooc >> >> I'm working on an algorithm to estimate the neighborhood function of the >> graph (similar to [1]). I'm running this on the transposed adjacency matrix >> of a snapshot of the twitter follower graph [2]. For this graph out-of-core >> is not necessary, but I would like to run my algorithm on another larger >> graph that doesn't fit into the aggregated main memory of the cluster >> anymore. >> >> I think for testing purposes, you can run it on any large graph in >> adjacency form. >> >> Our cluster consists of 25 machines with 32GB ram, 8 cores and 4 disks per >> machine. I use the following options to run the algorithm: >> >> hadoop jar giraph-examples-1.1.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar >> org.apache.giraph.GiraphRunner >> >> org.apache.giraph.examples.hyperball.HyperBall >> >> --vertexInputFormat org.apache.giraph.examples.hyperball. >> HyperBallTextInputFormat >> >> --vertexInputPath hdfs:///ssc/twitter-negative/ >> >> --vertexOutputFormat org.apache.giraph.io.formats. >> IdWithValueTextOutputFormat >> >> --outputPath hdfs:///ssc/tmp-123/ >> >> --combiner org.apache.giraph.comm.messages.HyperLogLogCombiner >> >> --outEdges org.apache.giraph.edge.LongNullArrayEdges >> >> --workers 24 >> >> --customArguments >> >> giraph.oneToAllMsgSending=true, >> giraph.isStaticGraph=true, >> giraph.numComputeThreads=15, >> giraph.numInputThreads=15, >> giraph.numOutputThreads=15, >> giraph.maxNumberOfSupersteps=30, >> giraph.useOutOfCoreGraph=true, >> giraph.maxPartitionsInMemory=20 >> >> Best, >> Sebastian >> >> [1] http://arxiv.org/abs/1308.2144 >> [2] http://konect.uni-koblenz.de/networks/twitter_mpi >> >> >> On 02/12/2014 04:21 PM, Armando Miraglia wrote: >> >>> >>> Hi Sebastian, >>> >>> On Wed, Feb 12, 2014 at 02:59:20PM +0100, Sebastian Schelter wrote: >>> >>>> No. Should I have done that? >>>> >>> >>> could you please provide me with the test you have done together with >>> the variables that you have set during for the computation? This would >>> help me a lot. >>> >>> Cheers, >>> Armando >>> >>> >> > >