Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A3D51200AE4 for ; Wed, 11 May 2016 03:42:40 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id A2541160A11; Wed, 11 May 2016 01:42:40 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C476116098A for ; Wed, 11 May 2016 03:42:39 +0200 (CEST) Received: (qmail 53695 invoked by uid 500); 11 May 2016 01:42:39 -0000 Mailing-List: contact user-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@crunch.apache.org Delivered-To: mailing list user@crunch.apache.org Received: (qmail 53685 invoked by uid 99); 11 May 2016 01:42:38 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 May 2016 01:42:38 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 856B8C309B for ; Wed, 11 May 2016 01:42:38 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.879 X-Spam-Level: * X-Spam-Status: No, score=1.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 1Lf3X-kCkvaj for ; Wed, 11 May 2016 01:42:35 +0000 (UTC) Received: from mail-yw0-f178.google.com (mail-yw0-f178.google.com [209.85.161.178]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 000A25F2F0 for ; Wed, 11 May 2016 01:42:34 +0000 (UTC) Received: by mail-yw0-f178.google.com with SMTP id g133so29593591ywb.2 for ; Tue, 10 May 2016 18:42:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to; bh=TDWtzqyi2Pv669e0aHn1JysdI0fdDFvtso/D5xO0Em0=; b=wLMajAYvn86yP3BGLSg4jBIUAohbJlDRhnKmhA+gFkFxZW6v9eM9po71Ltm2Nohlkt a/usoam8gwWTN4676f6VjKBYUjc/CUgfisjNRksriD275ruk9gj4zNdeCLcCuXrFq0TC xgcVwBulfJV3TOdUa+6gPxpFxOqd3B8FYpAchWNvLrUqxbmF4xKM4/w0Q+Sro4RVBvjz jbDi7WrpaPxAzzSQtWfTfNWZfkklkPoB2qRvKsfND2oFAb8tBy2BmGf2H50NzvLSuxrG gMyW3rUk8CuW17UmArE3crtDw65g9CnpTGqbqAILlSQ5Sri28Bbf97BR4jiRQhFnfZbH 4+7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to; bh=TDWtzqyi2Pv669e0aHn1JysdI0fdDFvtso/D5xO0Em0=; b=QhMquURsHM3clKZSrn1AzODNMBX7w4YNqBzE0ChXzcsv4IdKemVyxiHgZmmOjqFVoq 0KDmQiZjcyTE0iGkUKpn5YqI6wL5JbcGkJht5kmGmaWz3lkvXnCxTq9rDcjuJ3cBSn2b fqTMZglUM4JyCASXZXEpYa/HnNLO6V3gk50zns2Za5DJeDwdAthGpIOvf8F6Uhl+0SWv cLxQZM++bUcUWZ+/kbLDskoLhxKcNxWIYCdlIvezmA7Wu8999oJAZirnJw8ikCu2EoNm OKxbJcZVWJRng7N8Qz5uO42mauqXfnsgG8F1qe8JTNLhICDGKr4uzxIrwNG/M78oCm8w u8RQ== X-Gm-Message-State: AOPr4FXhxMZ4vCgUrQTQ0eMes2qYsJdTEnlgkwr7fT5EDiBThS6Xk1p4AnbCMb15AFs3PdiKFhi1k2R2A5dfhQ== MIME-Version: 1.0 X-Received: by 10.37.3.198 with SMTP id 189mr282458ybd.28.1462930947971; Tue, 10 May 2016 18:42:27 -0700 (PDT) Received: by 10.37.2.209 with HTTP; Tue, 10 May 2016 18:42:27 -0700 (PDT) In-Reply-To: References: Date: Wed, 11 May 2016 09:42:27 +0800 Message-ID: Subject: Re: confused about the MapsideJoinStrategy, why use LoadLeftSideMapsideJoinStrategy, what if left table is too large to store in memory? From: =?UTF-8?B?6ZmI56ue?= To: user@crunch.apache.org Content-Type: multipart/alternative; boundary=001a11c02dda7c201a053287278b archived-at: Wed, 11 May 2016 01:42:40 -0000 --001a11c02dda7c201a053287278b Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable mapsideJoinStrategy.create() use LoadLeftSideMapsideJoinStrategy, i'm just confused why LoadLeftSideMapsideJoinStrategy is better than default strategy. according to the annotation, LoadLeftSideMapsideJoinStrategy peforms better than default strategy, but i don't know why 2016-05-10 11:30 GMT+08:00 David Ortiz : > Try mapsideJoinStrategy.create() > > On Mon, May 9, 2016, 9:29 PM =E9=99=88=E7=AB=9E wro= te: > >> hi, i'm very confused when i use MapsideJoinStrategy. the origin >> constructor was deprecated, instead, LoadLeftSideMapsideJoinStrategy was >> recommended, the main improvement is that load left side table in memory= , >> whose size is large than right side. however, when i want to use mas sid= e >> join, the left side table usually is too large to store in memory. >> >> for example i have to table A and B, we need A left join B, and >> size(A)>>size(B), naturally we want to use map side join, and use A as l= eft >> side, B as right side, then load B in memory to process, it's very simpl= e. >> However, if we use LoadLeftSideMapsideJoinStrategy, we use A as right si= de, >> B as left side, which makes no improvement while adding a reverse DoFn >> >> >> >> -- >> =E9=99=88=E7=AB=9E=EF=BC=8C=E4=B8=AD=E7=A7=91=E9=99=A2=E8=AE=A1=E7=AE=97= =E6=8A=80=E6=9C=AF=E7=A0=94=E7=A9=B6=E6=89=80=EF=BC=8C=E9=AB=98=E6=80=A7=E8= =83=BD=E8=AE=A1=E7=AE=97=E6=9C=BA=E4=B8=AD=E5=BF=83 >> Jing Chen HPCC.ICT.AC China >> > --=20 =E9=99=88=E7=AB=9E=EF=BC=8C=E4=B8=AD=E7=A7=91=E9=99=A2=E8=AE=A1=E7=AE=97=E6= =8A=80=E6=9C=AF=E7=A0=94=E7=A9=B6=E6=89=80=EF=BC=8C=E9=AB=98=E6=80=A7=E8=83= =BD=E8=AE=A1=E7=AE=97=E6=9C=BA=E4=B8=AD=E5=BF=83 Jing Chen HPCC.ICT.AC China --001a11c02dda7c201a053287278b Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
mapsideJoinStrategy.create(= ) =C2=A0use=C2=A0LoadLeftSideMapsideJ= oinStrategy, i'm just confused wh= y=C2=A0LoadLeftSideMapsideJoinStrateg= y is better than default strategy.

according to the annotation,=C2=A0LoadLeftSideMapsideJoinStrategy peforms better than default strategy, but i don't know why

2= 016-05-10 11:30 GMT+08:00 David Ortiz <dpo5003@gmail.com>:

Try mapsideJoinStrategy.cre= ate()


On Mon, May 9, 2016, 9:29 P= M =E9=99=88=E7=AB=9E <cj.magina@gmail.com> wrote:
hi, i'm very confused when i use MapsideJoinStr= ategy. the origin constructor was deprecated, instead,=C2=A0LoadLeftSideMap= sideJoinStrategy was recommended, the main improvement is that load left si= de table in memory, whose size is large than right side. however, when i wa= nt to use mas side join, the left side table usually is too large to store = in memory.=C2=A0

for example i have to table A and B, we= need A left join B, and size(A)>>size(B), naturally we want to use m= ap side join, and use A as left side, B as right side, then load B in memor= y to process, it's very simple. However, if we use=C2=A0LoadLeftSideMap= sideJoinStrategy, we use A as right side, B as left side, which makes no im= provement while adding a reverse DoFn=C2=A0



--
=E9=99=88=E7=AB=9E=EF= =BC=8C=E4=B8=AD=E7=A7=91=E9=99=A2=E8=AE=A1=E7=AE=97=E6=8A=80=E6=9C=AF=E7=A0= =94=E7=A9=B6=E6=89=80=EF=BC=8C=E9=AB=98=E6=80=A7=E8=83=BD=E8=AE=A1=E7=AE=97= =E6=9C=BA=E4=B8=AD=E5=BF=83
Jing Chen HPCC.ICT.AC China



--
=
=E9=99=88=E7=AB=9E=EF=BC=8C=E4=B8=AD=E7=A7= =91=E9=99=A2=E8=AE=A1=E7=AE=97=E6=8A=80=E6=9C=AF=E7=A0=94=E7=A9=B6=E6=89=80= =EF=BC=8C=E9=AB=98=E6=80=A7=E8=83=BD=E8=AE=A1=E7=AE=97=E6=9C=BA=E4=B8=AD=E5= =BF=83
Jing Chen HPCC.I= CT.AC China
--001a11c02dda7c201a053287278b--