Return-Path: X-Original-To: apmail-crunch-user-archive@www.apache.org Delivered-To: apmail-crunch-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3EE4F189D8 for ; Wed, 2 Sep 2015 21:18:36 +0000 (UTC) Received: (qmail 98670 invoked by uid 500); 2 Sep 2015 21:18:20 -0000 Delivered-To: apmail-crunch-user-archive@crunch.apache.org Received: (qmail 98584 invoked by uid 500); 2 Sep 2015 21:18:20 -0000 Mailing-List: contact user-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@crunch.apache.org Delivered-To: mailing list user@crunch.apache.org Received: (qmail 98574 invoked by uid 99); 2 Sep 2015 21:18:20 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Sep 2015 21:18:20 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 9B366C0E06 for ; Wed, 2 Sep 2015 21:18:19 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.899 X-Spam-Level: ** X-Spam-Status: No, score=2.899 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id BIJ1EnrQ5JLr for ; Wed, 2 Sep 2015 21:18:14 +0000 (UTC) Received: from mail-io0-f170.google.com (mail-io0-f170.google.com [209.85.223.170]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 228F920647 for ; Wed, 2 Sep 2015 21:18:14 +0000 (UTC) Received: by ioiz6 with SMTP id z6so35899487ioi.2 for ; Wed, 02 Sep 2015 14:18:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=RTMKg6SPBTdOkiBBk7K6E03j/soj4kva4KqI8X5gcOs=; b=JY+MfuhtUN3NXAufUEKogaetFbmZjyVkTF/NPKtfVAP2bVs3Z/EqA/qnfMG8FuIQwj xND5s7CpXSDYIuwYwdl3UNLzgWc5dA8YvCdS04m9AY3v9jcEInJv0pfv/RbIxx9aXQg+ D1lggmOOScgP3aqnsDHWWC+IjusqPsQgg16sGu6ry7oDC+hOcQsHn9obhx3dmsXzINpk +mDXNmy/ZdaO1uvGzxSzzpCUYAL/285JksezufJGbqN5DBX6qQd47QLh1aAaK7cHDYjR Q0YNGMRCWnBuCyYPhP9tI2Tex40CYVJlTdgmoOcWwsPsuVLGV2irlmM5kRXGoOOtDVDs A7JA== X-Received: by 10.107.39.142 with SMTP id n136mr30369454ion.193.1441228693515; Wed, 02 Sep 2015 14:18:13 -0700 (PDT) MIME-Version: 1.0 Received: by 10.79.75.2 with HTTP; Wed, 2 Sep 2015 14:17:54 -0700 (PDT) From: Surbhi Mungre Date: Wed, 2 Sep 2015 16:17:54 -0500 Message-ID: Subject: JoinStrategy with Spark pipeline To: user@crunch.apache.org Content-Type: multipart/alternative; boundary=001a114088905156a3051eca347d --001a114088905156a3051eca347d Content-Type: text/plain; charset=UTF-8 I was trying to determine effect of changing JoinStrategy on a Spark pipeline. I noticed that my pipeline works fine with DefaultJoinStrategy, however I could not get it to working with MapSideJoinStrategy and BloomFilterJoinStrategy. For MapSideJoinStrategy I get an exceptions[1] on driver itself and for BloomFilterJoinStrategy I get exceptions[2] in one of the stages. I have not tried to do any configuration changes but I did run tests with datasets of different sizes to ensure that my PCollection is small enough to fit in memory. I am running spark in yarn-client mode with Crunch 0.11.0-cdh5.4.2. [1] https://gist.github.com/anonymous/15d6c691b743ad392d42 [2] https://gist.github.com/anonymous/b02a82401a30a69f1cff Thanks, Surbhi --001a114088905156a3051eca347d Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I was trying to determine effect of changing JoinStrategy = on a Spark pipeline. I noticed that my pipeline works fine with DefaultJoin= Strategy, however I could not get it to working with MapSideJoinStrategy an= d BloomFilterJoinStrategy. For=C2=A0MapSideJoinStrategy I get an exceptions= [1] on driver itself and for BloomFilterJoinStrategy I get exceptions[2] in= one of the stages. I have not tried to do any configuration changes but I = did run tests with datasets of different sizes to ensure that my PCollectio= n is small enough to fit in memory. I am running spark in yarn-client mode = with Crunch=C2=A00.11.0-cdh5.4.2.

--001a114088905156a3051eca347d--