Return-Path: X-Original-To: apmail-giraph-dev-archive@www.apache.org Delivered-To: apmail-giraph-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6F1C1EA40 for ; Sat, 16 Feb 2013 00:29:14 +0000 (UTC) Received: (qmail 56449 invoked by uid 500); 16 Feb 2013 00:29:14 -0000 Delivered-To: apmail-giraph-dev-archive@giraph.apache.org Received: (qmail 56398 invoked by uid 500); 16 Feb 2013 00:29:14 -0000 Mailing-List: contact dev-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@giraph.apache.org Delivered-To: mailing list dev@giraph.apache.org Received: (qmail 56384 invoked by uid 99); 16 Feb 2013 00:29:14 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 16 Feb 2013 00:29:14 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id CC39F1C76CE; Sat, 16 Feb 2013 00:29:06 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============9130633653009166991==" MIME-Version: 1.0 Subject: Re: Review Request: GIRAPH-515: GIRAPH-515: More efficient and flexible edge-based input From: "Alessandro Presta" To: "Maja Kabiljo" , "giraph" , "Nitay Joffe" , "Alessandro Presta" Date: Sat, 16 Feb 2013 00:29:06 -0000 Message-ID: <20130216002906.10279.39206@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org Auto-Submitted: auto-generated Sender: "Alessandro Presta" X-ReviewGroup: giraph X-ReviewRequest-URL: https://reviews.apache.org/r/9449/ X-Sender: "Alessandro Presta" References: <20130215181708.9635.77149@reviews.apache.org> In-Reply-To: <20130215181708.9635.77149@reviews.apache.org> Reply-To: "Alessandro Presta" --===============9130633653009166991== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/9449/ ----------------------------------------------------------- (Updated Feb. 16, 2013, 12:29 a.m.) Review request for giraph. Changes ------- Addressed last comments. Thanks Nitay and Maja for all the tips, it resulted in a much cleaner codeb= ase. Committing this. Summary (updated) ----------------- GIRAPH-515: GIRAPH-515: More efficient and flexible edge-based input Description ------- This patch adds the following classes: - SendWorkerEdgesRequest: a request used to send edges during input superst= ep, similar to the corresponding one for messages - SendEdgeCache: similar to SendMessageCache - ByteArrayVertexIdEdges: serialized representation for lists of edges (for= different source vertices), similar to the corresponding one for messages - EdgeStore: a server-side structure that stores transient edges from incom= ing requests, and later moves them to the owning vertices. - ByteArrayEdges: an edge list (for the same source vertex) stored as a byt= e-array. The standard way of iterating is by reusing Edge objects, but an a= lternative iterator that instantiates new objects is provided. Depending on= the vertex implementation, we use one of the other. This is a refactor of the byte-array code in RepresentativeVertex, which no= w contains an instance of ByteArrayEdges. When calling setEdges(), RepresentativeVertex is smart to realize that the = passed Iterable is actually an instance of ByteArrayEdges, and simply takes= ownership of it (without iterating). If using something like EdgeListVertex (which keeps references to the passe= d edges), we will use the alternative iterable (this is of course less memo= ry-efficient). I've also renamed RepresentativeVertex to ByteArrayVertex because it was mi= sleading (it doesn't need to be used with ByteArrayPartition, it's perfectl= y fine to have multiple Vertex objects, each storing its edges in a byte-ar= ray). Future work: EdgeStore could become an interface in the future, allowing for different i= mplementations (e.g. out-of-core) and handling permanent edge storage in pl= ace of Vertex. That way, we would have only one Vertex class, and pluggable= storage implementations (which makes it easier to switch without changing = user code). This addresses bug GIRAPH-515. https://issues.apache.org/jira/browse/GIRAPH-515 Diffs (updated) ----- giraph-core/src/main/java/org/apache/giraph/benchmark/ByteArrayVertexPage= RankBenchmark.java PRE-CREATION = giraph-core/src/main/java/org/apache/giraph/benchmark/MultiGraphByteArray= VertexPageRankBenchmark.java PRE-CREATION = giraph-core/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.j= ava 19b08bdb19df21b1dc56dad2cebb499222f9b19e = giraph-core/src/main/java/org/apache/giraph/comm/SendCache.java PRE-CREAT= ION = giraph-core/src/main/java/org/apache/giraph/comm/SendEdgeCache.java PRE-C= REATION = giraph-core/src/main/java/org/apache/giraph/comm/SendMessageCache.java 3c= bf0eb4775fa3ff0b0351f247df87783bf05995 = giraph-core/src/main/java/org/apache/giraph/comm/ServerData.java 3655d79d= 8f249338da30ae2bb38b9cfd6b7b1f56 = giraph-core/src/main/java/org/apache/giraph/comm/WorkerClientRequestProce= ssor.java 0c043e29ae3160bbfc389c435427cf57010a91e1 = giraph-core/src/main/java/org/apache/giraph/comm/WorkerServer.java e60db5= 529b7fef0b16441ef88df7053d6856ffc5 = giraph-core/src/main/java/org/apache/giraph/comm/messages/ByteArrayMessag= esPerVertexStore.java 65caa5d2777b90fa8e14bee7c8d69316d512c651 = giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyWorkerClientR= equestProcessor.java d4e919ed1aa1f977a2e487531f57b3a2fc0fad47 = giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.= java 1b7cc5410aa4d7e1b9ae4580dd5ed484e09ff7ed = giraph-core/src/main/java/org/apache/giraph/comm/requests/RequestType.jav= a aac00289f915f61e61334cdcd92c93c1ef3b5419 = giraph-core/src/main/java/org/apache/giraph/comm/requests/SendWorkerDataR= equest.java PRE-CREATION = giraph-core/src/main/java/org/apache/giraph/comm/requests/SendWorkerEdges= Request.java PRE-CREATION = giraph-core/src/main/java/org/apache/giraph/comm/requests/SendWorkerMessa= gesRequest.java 641c795521006c460138d6b3b6d9ceb3c3e7eccf = giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java= 9e129efebe39c42bab9d59b3246055b79cdbdfa3 = giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java 44d= 09c9462231874a9fed337215ac9fd650bb6d0 = giraph-core/src/main/java/org/apache/giraph/conf/ImmutableClassesGiraphCo= nfiguration.java 3e158afdc480656b3937508f5d86ec294bfa3b99 = giraph-core/src/main/java/org/apache/giraph/graph/EdgeStore.java PRE-CREA= TION = giraph-core/src/main/java/org/apache/giraph/partition/ByteArrayPartition.= java 12989180a4aabed19c3aefa52ef38ad6d7aa6851 = giraph-core/src/main/java/org/apache/giraph/partition/DiskBackedPartition= Store.java 725de39c4dfd2249a40203f62d93e9d0b246240b = giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayEdges.java PRE= -CREATION = giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayVertexIdData.j= ava PRE-CREATION = giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayVertexIdEdges.= java PRE-CREATION = giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayVertexIdMessag= es.java dea4229f10224edb30f59626d5987ea840e8a271 = giraph-core/src/main/java/org/apache/giraph/utils/VertexIdIterator.java P= RE-CREATION = giraph-core/src/main/java/org/apache/giraph/utils/WritableUtils.java fefe= 9a09b0570b8f6626243a2e51f386e18f2fe0 = giraph-core/src/main/java/org/apache/giraph/vertex/ByteArrayVertex.java P= RE-CREATION = giraph-core/src/main/java/org/apache/giraph/vertex/ByteArrayVertexBase.ja= va PRE-CREATION = giraph-core/src/main/java/org/apache/giraph/vertex/EdgeListVertex.java 9a= e692fc00432e28f0b87f11ed5981e600c95019 = giraph-core/src/main/java/org/apache/giraph/vertex/MultiGraphByteArrayVer= tex.java PRE-CREATION = giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java = fa3ab49f11d61352a5f6f69699375abd2bf1e527 = giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallabl= e.java bdf9f5705811340748172a70dc952493d5ececfc = giraph-core/src/test/java/org/apache/giraph/comm/RequestFailureTest.java = 2845c90cbfd38f2f35e70e3b79767e1386d54a7e = giraph-core/src/test/java/org/apache/giraph/comm/RequestTest.java d779fe4= 6377eaa8fa2debf0836f975a30ec6e21f = giraph-core/src/test/java/org/apache/giraph/utils/MockUtils.java 82dc2839= d83f80ebcf52bad252886d50310eacc5 = giraph-core/src/test/java/org/apache/giraph/vertex/TestMultiGraphVertex.j= ava a5a3545de7dc9e30ab0f30926122049fdbe1173b = giraph-core/src/test/java/org/apache/giraph/vertex/TestMutableVertex.java= ca4ba1a336f68b584c4fdbaf74be60dbe41644d5 = Diff: https://reviews.apache.org/r/9449/diff/ Testing ------- mvn verify Tested on both benchmarks and real-world applications. This typically brings requirements down a lot: in an application using a fe= w hundred billion edges, which previously only ran with 300 workers, we're = now able to run with 100 workers, with a lot of memory to spare and even fa= ster than before (from around 600s to 400s). Thanks, Alessandro Presta --===============9130633653009166991==--