Return-Path: X-Original-To: apmail-incubator-crunch-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-crunch-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B1E41D64C for ; Tue, 11 Dec 2012 22:10:37 +0000 (UTC) Received: (qmail 96857 invoked by uid 500); 11 Dec 2012 22:10:37 -0000 Delivered-To: apmail-incubator-crunch-dev-archive@incubator.apache.org Received: (qmail 96824 invoked by uid 500); 11 Dec 2012 22:10:37 -0000 Mailing-List: contact crunch-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: crunch-dev@incubator.apache.org Delivered-To: mailing list crunch-dev@incubator.apache.org Received: (qmail 96806 invoked by uid 99); 11 Dec 2012 22:10:37 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Dec 2012 22:10:37 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id F156B1C5546; Tue, 11 Dec 2012 22:10:32 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============1106232016473478967==" MIME-Version: 1.0 Subject: Re: Review Request: CRUNCH-128: Enable pipeline stages to depend on files being created on the filesystem. From: "Gabriel Reid" To: "Gabriel Reid" Cc: "crunch" , "Josh Wills" Date: Tue, 11 Dec 2012 22:10:32 -0000 Message-ID: <20121211221032.7729.16780@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org Auto-Submitted: auto-generated Sender: "Gabriel Reid" X-ReviewGroup: crunch X-ReviewRequest-URL: https://reviews.apache.org/r/8463/ X-Sender: "Gabriel Reid" References: <20121211072605.7759.8563@reviews.apache.org> In-Reply-To: <20121211072605.7759.8563@reviews.apache.org> Reply-To: "Gabriel Reid" --===============1106232016473478967== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/8463/#review14317 ----------------------------------------------------------- In general it seems pretty good to me. It would be nice if there were some = more tests on this functionality specifically (outside of being dependent o= n the MapsideJoinIT), but I'm also having a hard time thinking of some good= test cases. One idea of an additional test that *is* related to MapsideJoi= ns could be to ensure that multiple mapside joins run in parallel do their = setup in the same job (instead of each being run in separate jobs like in t= he past). I did run into one issue by accident that put the planner in an infinite lo= op, but I'll put the details of that on JIRA. crunch/src/it/java/org/apache/crunch/lib/join/MapsideJoinIT.java This test should probably be called testMapsideJoin_MemPipeline instead= now. Previously the point of it was to point out that anything that wasn't= a MRPipeline wouldn't work, whereas now the point is to show that it speci= fically does work with a MemPipeline. crunch/src/main/java/org/apache/crunch/impl/mr/collect/PCollectionImpl.java Is there a reason that a non-immutable Set isn't just used from in the = start? When you see this statement (instantiating a new set if the existing= one is empty), it looks like a mistake at first. - Gabriel Reid On Dec. 11, 2012, 7:26 a.m., Josh Wills wrote: > = > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/8463/ > ----------------------------------------------------------- > = > (Updated Dec. 11, 2012, 7:26 a.m.) > = > = > Review request for crunch and Gabriel Reid. > = > = > Description > ------- > = > This involves updating the PCollectionImpl class to be able to track any = SourceTarget instances that it needs to exist before any Target that depend= s on this PCollectionImpl can be created, and optimizing the MSCRPlanner to= check for this information and build the jobs to incorporate these depende= ncies. > = > This isn't the prettiest implementation of this idea, but I think it'll t= urn out to be a useful thing to have. > = > = > This addresses bug CRUNCH-128. > https://issues.apache.org/jira/browse/CRUNCH-128 > = > = > Diffs > ----- > = > crunch/src/it/java/org/apache/crunch/lib/join/MapsideJoinIT.java 297680= e = > crunch/src/main/java/org/apache/crunch/Pipeline.java bcf8727 = > crunch/src/main/java/org/apache/crunch/impl/mem/MemPipeline.java 77c41c= e = > crunch/src/main/java/org/apache/crunch/impl/mr/MRPipeline.java 60950f3 = > crunch/src/main/java/org/apache/crunch/impl/mr/collect/PCollectionImpl.= java f0d8187 = > crunch/src/main/java/org/apache/crunch/impl/mr/plan/MSCRPlanner.java 7f= e2809 = > crunch/src/main/java/org/apache/crunch/io/ReadableSourceTarget.java 95c= 90aa = > crunch/src/main/java/org/apache/crunch/lib/join/MapsideJoin.java 0ca1ab= 3 = > crunch/src/main/java/org/apache/crunch/materialize/MaterializableIterab= le.java 3830616 = > = > Diff: https://reviews.apache.org/r/8463/diff/ > = > = > Testing > ------- > = > Updated the mapside join IT to use the new code and fixed the in-memory i= mpl to work properly. > = > = > Thanks, > = > Josh Wills > = > --===============1106232016473478967==--