Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 300B510017 for ; Wed, 16 Apr 2014 09:49:51 +0000 (UTC) Received: (qmail 2481 invoked by uid 500); 16 Apr 2014 09:49:39 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 1998 invoked by uid 500); 16 Apr 2014 09:49:38 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 1918 invoked by uid 99); 16 Apr 2014 09:49:37 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Apr 2014 09:49:37 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_FONT_LOW_CONTRAST,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of leiwangouc@gmail.com designates 209.85.192.182 as permitted sender) Received: from [209.85.192.182] (HELO mail-pd0-f182.google.com) (209.85.192.182) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Apr 2014 09:49:31 +0000 Received: by mail-pd0-f182.google.com with SMTP id y10so10578029pdj.13 for ; Wed, 16 Apr 2014 02:49:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:subject:references:mime-version:message-id :content-type; bh=7ksfJ8Vvl4WBb1nZJsv6WzCIGi3f0EGzuNEfclXuVIc=; b=DcwIOVQN1rKM3gFX10kBVUQ2Zz+xPiWrP0zWs27t0+YcQWoWOWVDV6jFhBYlbZKLif rcWoJcJGSc5PtOo06Ds7RpunwpfjB0XC8bimsr4XfHOT8aktbv+AUeKWvDT+VoQsgfog FpBoe5ggNgV1GLBRCUNj32Pq0KMg8Fn2Avhetv4lo6uvzYphDUxX6sXkwLdvRMuMSiqP aagFsD7n7rUhddyJnBmFrqeWsoUuM0njdow9aZKmrjmN3Yd0T9iLruxnpunsrjP3xw3Y o6BWITV06EWugUMU7FcEWeFmRwI0foe1eXxjuiSl5jTs0ZZKWkEk8EnB2CmMLi40eS9U 6Adg== X-Received: by 10.68.93.3 with SMTP id cq3mr7321409pbb.145.1397641747856; Wed, 16 Apr 2014 02:49:07 -0700 (PDT) Received: from CHINA-20140403C ([106.37.185.138]) by mx.google.com with ESMTPSA id q7sm45841349pbc.20.2014.04.16.02.49.03 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 16 Apr 2014 02:49:07 -0700 (PDT) Date: Wed, 16 Apr 2014 17:49:03 +0800 From: "leiwangouc@gmail.com" To: user , th , german.fl , user Subject: Re: Re: java.lang.OutOfMemoryError related with number of reducer? References: <2014041520410525592328@gmail.com>, <1397569988.11988.33.camel@localhost.lan>, <201404152228456317944@gmail.com>, <02c301cf58bf$3cb95350$b62bf9f0$@samsung.com>, <201404152334548325946@gmail.com>, <2014041611581215299625@gmail.com> X-Priority: 3 X-GUID: 288F5C57-CE15-4069-9893-17FEB5F153C8 X-Has-Attach: no X-Mailer: Foxmail 7, 2, 5, 136[en] Mime-Version: 1.0 Message-ID: <2014041617490096006758@gmail.com> Content-Type: multipart/alternative; boundary="----=_001_NextPart631813124738_=----" X-Virus-Checked: Checked by ClamAV on apache.org This is a multi-part message in MIME format. ------=_001_NextPart631813124738_=---- Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: base64 DQpGb3VuZCB0aGUgcm9vdCByZWFzb24uDQpJdCBpcyBiZWNhdXNlIG9mIHRoZSBuZXN0ZWQgZGlz dGluY3Qgb3BlcmF0aW9uIHJlbGllcyBvbiB0aGUgUkFNIHRvIGNhbGN1bGF0ZSB1bmlxdWUgdmFs dWVzLg0KQXMgZGVzY3JpYmVkIGhlcmU6IGh0dHA6Ly9zdGFja292ZXJmbG93LmNvbS9xdWVzdGlv bnMvMTA3MzI0NTYvaG93LXRvLW9wdGltaXplLWEtZ3JvdXAtYnktc3RhdGVtZW50LWluLXBpZy1s YXRpbiANCg0KVGhhbmtzLA0KTGVpDQoNCg0KbGVpd2FuZ291Y0BnbWFpbC5jb20NCiANCkZyb206 IGxlaXdhbmdvdWNAZ21haWwuY29tDQpEYXRlOiAyMDE0LTA0LTE2IDExOjU4DQpUbzogdXNlcjsg dGg7IGdlcm1hbi5mbDsgdXNlcg0KU3ViamVjdDogUmU6IFJlOiBqYXZhLmxhbmcuT3V0T2ZNZW1v cnlFcnJvciByZWxhdGVkIHdpdGggbnVtYmVyIG9mIHJlZHVjZXI/DQpIaSBHZXJtYW4gJiBUaG9t YXMsDQoNCiAgICBTZWVtcyBpIGZvdW5kIHRoZSBkYXRhIHRoYXQgY2F1c2VzIHRoZSBlcnJvciwg YnV0IGkgc3RpbGwgZG9uJ3Qga25vdyB0aGUgZXhhY3RseSByZWFzb24uDQoNCiAgICBJIGp1c3Qg ZG8gYSBncm91cCB3aXRoIHBpZyBsYXRpbjogDQogICAgDQpkb21haW5fZGV2aWNlX2dyb3VwID0g R1JPVVAgZGF0YV9maWx0ZXIgQlkgKGN1c3RpZCwgZG9tYWluLCBsZXZlbCwgZGV2aWNlKTsgDQpk b21haW5fZGV2aWNlID0gRk9SRUFDSCBkb21haW5fZGV2aWNlX2dyb3VwIHsgDQpkaXN0aW5jdF9p cCA9IERJU1RJTkNUIGRhdGFfZmlsdGVyLmlwOyANCiAgICAgICAgZGlzdGluY3RfdXNlcmlkID0g RElTVElOQ1QgZGF0YV9maWx0ZXIudXNlcmlkOyANCiAgICAgICAgR0VORVJBVEUgZ3JvdXAuY3Vz dGlkLCBncm91cC5kb21haW4sIGdyb3VwLmxldmVsLCBncm91cC5kZXZpY2UsIENPVU5UX1NUQVIo ZGF0YV9maWx0ZXIpLCBDT1VOVF9TVEFSKGRpc3RpbmN0X2lwKSwgQ09VTlRfU1RBUihkaXN0aW5j dF91c2VyaWQpOyANCn0gDQpTVE9SRSBkb21haW5fZGV2aWNlIElOVE8gJyRvdXRwdXRkaXIvJGJh dGNoZGF0ZS9kYXRhL2RvbWFpbl9kZXZpY2UnIFVTSU5HIFBpZ1N0b3JhZ2UoJ1x0Jyk7DQoNClRo ZSBncm91cCBrZXkgKGN1c3RpZCwgZG9tYWluLCBsZXZlbCwgZGV2aWNlKSAgaXMgc2lnbmlmaWNh bnRseSBza2V3ZWQsICBhYm91dCA0MiUgKDU4LDYyMSw1MzMgLyAxMzgsNDU1LDM1NSkgb2YgdGhl IHJlY29yZHMgYXJlIHRoZSBzYW1lIGtleSwgYW5kIG9ubHkgdGhlIHJlZHVjZXIgd2hpY2ggaGFu ZGxlIHRoaXMga2V5IGZhaWxlZC4NCkJ1dCBmcm9tIGh0dHBzOi8vd3d3Lmlua2xpbmcuY29tL3Jl YWQvaGFkb29wLWRlZmluaXRpdmUtZ3VpZGUtdG9tLXdoaXRlLTNyZC9jaGFwdGVyLTYvc2h1ZmZs ZS1hbmQtc29ydCAsICBJIHN0aWxsIGhhdmUgbm8gaWRlYSB3aHkgaXQgY2F1c2UgYW4gT09NLiAg SXQgZG9lc24ndCB0ZWxsIGhvdyBza2V3ZWQga2V5IHdpbGwgYmUgaGFuZGxlZCwgbmVpdGhlciBo b3cgZGlmZmVyZW50IGtleXMgaW4gc2FtZSByZWR1Y2VyIHdpbGwgYmUgbWVyZ2VkLiANCg0KDQps ZWl3YW5nb3VjQGdtYWlsLmNvbQ0KIA0KRnJvbTogbGVpd2FuZ291Y0BnbWFpbC5jb20NCkRhdGU6 IDIwMTQtMDQtMTUgMjM6MzUNClRvOiB1c2VyOyB0aDsgZ2VybWFuLmZsDQpTdWJqZWN0OiBSZTog UkU6IG1lbW9yeWphdmEubGFuZy5PdXRPZk1lbW9yeUVycm9yIHJlbGF0ZWQgd2l0aCBudW1iZXIg b2YgcmVkdWNlcj8NClRoYW5rcywgbGV0IG1lIHRha2UgYSBjYXJlZnVsIGxvb2sgYXQgaXQuIA0K DQoNCg0KbGVpd2FuZ291Y0BnbWFpbC5jb20NCiANCkZyb206IEdlcm1hbiBGbG9yZXotTGFycmFo b25kbw0KRGF0ZTogMjAxNC0wNC0xNSAyMzoyNw0KVG86IHVzZXI7ICd0aCcNClN1YmplY3Q6IFJF OiBSZTogbWVtb3J5amF2YS5sYW5nLk91dE9mTWVtb3J5RXJyb3IgcmVsYXRlZCB3aXRoIG51bWJl ciBvZiByZWR1Y2VyPw0KTGVpDQpBIGdvb2QgZXhwbGFuYXRpb24gb2YgdGhpcyBjYW4gYmUgZm91 bmQgb24gdGhlIEhhZG9vcCBUaGUgRGVmaW5pdGl2ZSBHdWlkZSBieSBUb20gV2hpdGUuIA0KSGVy ZSBpcyBhbiBleGNlcnB0IHRoYXQgZXhwbGFpbnMgYSBiaXQgdGhlIGJlaGF2aW9yIGF0IHRoZSBy ZWR1Y2Ugc2lkZSBhbmQgc29tZSBwb3NzaWJsZSB0d2Vha3MgdG8gY29udHJvbCBpdC4gDQogDQpo dHRwczovL3d3dy5pbmtsaW5nLmNvbS9yZWFkL2hhZG9vcC1kZWZpbml0aXZlLWd1aWRlLXRvbS13 aGl0ZS0zcmQvY2hhcHRlci02L3NodWZmbGUtYW5kLXNvcnQNCiANCiANCiANCkZyb206IGxlaXdh bmdvdWNAZ21haWwuY29tIFttYWlsdG86bGVpd2FuZ291Y0BnbWFpbC5jb21dIA0KU2VudDogVHVl c2RheSwgQXByaWwgMTUsIDIwMTQgOToyOSBBTQ0KVG86IHVzZXI7IHRoDQpTdWJqZWN0OiBSZTog UmU6IG1lbW9yeWphdmEubGFuZy5PdXRPZk1lbW9yeUVycm9yIHJlbGF0ZWQgd2l0aCBudW1iZXIg b2YgcmVkdWNlcj8NCiANClRoYW5rcyBUaG9tYXMuIA0KIA0KQW5vaHRlciBxdWVzdGlvbi4gIEkg aGF2ZSBubyBpZGVhIHdoYXQgaXMgIkZhaWxlZCB0byBtZXJnZSBpbiBtZW1vcnkiLiAgRG9lcyB0 aGUgJ21lcmdlJyBpcyB0aGUgc2h1ZmZsZSBwaGFzZSBpbiByZWR1Y2VyIHNpZGU/ICBXaHkgaXQg aXMgaW4gbWVtb3J5Pw0KRXhjZXB0IHRoZSB0d28gbWV0aG9kcyhpbmNyZWFzZSByZWR1Y2VyIG51 bWJlciBhbmQgaW5jcmVhc2UgaGVhcCBzaXplKSwgIGlzIHRoZXJlIGFueSBvdGhlciBhbHRlcm5h dGl2ZXMgdG8gZml4IHRoaXMgaXNzdWU/IA0KIA0KVGhhbmtzIGEgbG90Lg0KIA0KIA0KDQoNCmxl aXdhbmdvdWNAZ21haWwuY29tDQogDQpGcm9tOiBUaG9tYXMgQmVudHNlbg0KRGF0ZTogMjAxNC0w NC0xNSAyMTo1Mw0KVG86IHVzZXINClN1YmplY3Q6IFJlOiBtZW1vcnlqYXZhLmxhbmcuT3V0T2ZN ZW1vcnlFcnJvciByZWxhdGVkIHdpdGggbnVtYmVyIG9mIHJlZHVjZXI/DQpXaGVuIHlvdSBpbmNy ZWFzZSB0aGUgbnVtYmVyIG9mIHJlZHVjZXJzIHRoZXkgZWFjaCBoYXZlIGxlc3MgdG8gd29yaw0K d2l0aCBwcm92aWRlZCB0aGUgZGF0YSBpcyBkaXN0cmlidXRlZCBldmVubHkgYmV0d2VlbiB0aGVt IC0gaW4gdGhpcyBjYXNlDQphYm91dCBvbmUgdGhpcmQgb2YgdGhlIG9yaWdpbmFsIHdvcmsuDQpJ dCBpcyBlZXNzZW50aWFsbHkgdGhlIHNhbWUgdGhpbmcgYXMgaW5jcmVhc2luZyB0aGUgaGVhcCBz aXplIC0gaXQncw0KanVzdCBkaXN0cmlidXRlZCBiZXR3ZWVuIG1vcmUgcmVkdWNlcnMuDQogDQov dGgNCiANCiANCiANCk9uIFR1ZSwgMjAxNC0wNC0xNSBhdCAyMDo0MSArMDgwMCwgbGVpd2FuZ291 Y0BnbWFpbC5jb20gd3JvdGU6DQo+IEkgY2FuIGZpeCB0aGlzIGJ5IGNoYW5naW5nIGhlYXAgc2l6 ZS4NCj4gQnV0IHdoYXQgY29uZnVzZSBtZSBpcyB0aGF0IHdoZW4gaSBjaGFuZ2UgdGhlIHJlZHVj ZXIgbnVtYmVyIGZyb20gMjQNCj4gdG8gODQsIHRoZXJlJ3Mgbm8gdGhpcyBlcnJvci4NCj4gDQo+ IA0KPiBBbnkgaW5zaWdodCBvbiB0aGlzPw0KPiANCj4gDQo+IFRoYW5rcw0KPiBMZWkNCj4gRmFp bGVkIHRvIG1lcmdlIGluIG1lbW9yeWphdmEubGFuZy5PdXRPZk1lbW9yeUVycm9yOiBKYXZhIGhl YXAgc3BhY2UNCj4gYXQgamF2YS51dGlsLkFycmF5cy5jb3B5T2YoQXJyYXlzLmphdmE6Mjc4NikN Cj4gYXQgamF2YS5pby5CeXRlQXJyYXlPdXRwdXRTdHJlYW0ud3JpdGUoQnl0ZUFycmF5T3V0cHV0 U3RyZWFtLmphdmE6OTQpDQo+IGF0IGphdmEuaW8uRGF0YU91dHB1dFN0cmVhbS53cml0ZShEYXRh T3V0cHV0U3RyZWFtLmphdmE6OTApDQo+IGF0IGphdmEuaW8uRGF0YU91dHB1dFN0cmVhbS53cml0 ZVVURihEYXRhT3V0cHV0U3RyZWFtLmphdmE6Mzg0KQ0KPiBhdCBqYXZhLmlvLkRhdGFPdXRwdXRT dHJlYW0ud3JpdGVVVEYoRGF0YU91dHB1dFN0cmVhbS5qYXZhOjMwNikNCj4gYXQgb3JnLmFwYWNo ZS5waWcuZGF0YS51dGlscy5TZWRlc0hlbHBlci53cml0ZUNoYXJhcnJheShTZWRlc0hlbHBlci5q YXZhOjY2KQ0KPiBhdCBvcmcuYXBhY2hlLnBpZy5kYXRhLkJpbkludGVyU2VkZXMud3JpdGVEYXR1 bShCaW5JbnRlclNlZGVzLmphdmE6NTQzKQ0KPiBhdCBvcmcuYXBhY2hlLnBpZy5kYXRhLkJpbklu dGVyU2VkZXMud3JpdGVEYXR1bShCaW5JbnRlclNlZGVzLmphdmE6NDM1KQ0KPiBhdCBvcmcuYXBh Y2hlLnBpZy5kYXRhLnV0aWxzLlNlZGVzSGVscGVyLndyaXRlR2VuZXJpY1R1cGxlKFNlZGVzSGVs cGVyLmphdmE6MTM1KQ0KPiBhdCBvcmcuYXBhY2hlLnBpZy5kYXRhLkJpbkludGVyU2VkZXMud3Jp dGVUdXBsZShCaW5JbnRlclNlZGVzLmphdmE6NjEzKQ0KPiBhdCBvcmcuYXBhY2hlLnBpZy5kYXRh LkJpbkludGVyU2VkZXMud3JpdGVCYWcoQmluSW50ZXJTZWRlcy5qYXZhOjYwNCkNCj4gYXQgb3Jn LmFwYWNoZS5waWcuZGF0YS5CaW5JbnRlclNlZGVzLndyaXRlRGF0dW0oQmluSW50ZXJTZWRlcy5q YXZhOjQ0NykNCj4gYXQgb3JnLmFwYWNoZS5waWcuZGF0YS5CaW5JbnRlclNlZGVzLndyaXRlRGF0 dW0oQmluSW50ZXJTZWRlcy5qYXZhOjQzNSkNCj4gYXQgb3JnLmFwYWNoZS5waWcuZGF0YS51dGls cy5TZWRlc0hlbHBlci53cml0ZUdlbmVyaWNUdXBsZShTZWRlc0hlbHBlci5qYXZhOjEzNSkNCj4g YXQgb3JnLmFwYWNoZS5waWcuZGF0YS5CaW5JbnRlclNlZGVzLndyaXRlVHVwbGUoQmluSW50ZXJT ZWRlcy5qYXZhOjYxMykNCj4gYXQgb3JnLmFwYWNoZS5waWcuZGF0YS5CaW5JbnRlclNlZGVzLndy aXRlRGF0dW0oQmluSW50ZXJTZWRlcy5qYXZhOjQ0MykNCj4gYXQgb3JnLmFwYWNoZS5waWcuZGF0 YS5CaW5JbnRlclNlZGVzLndyaXRlRGF0dW0oQmluSW50ZXJTZWRlcy5qYXZhOjQzNSkNCj4gYXQg b3JnLmFwYWNoZS5waWcuZGF0YS51dGlscy5TZWRlc0hlbHBlci53cml0ZUdlbmVyaWNUdXBsZShT ZWRlc0hlbHBlci5qYXZhOjEzNSkNCj4gYXQgb3JnLmFwYWNoZS5waWcuZGF0YS5CaW5JbnRlclNl ZGVzLndyaXRlVHVwbGUoQmluSW50ZXJTZWRlcy5qYXZhOjYxMykNCj4gYXQgb3JnLmFwYWNoZS5w aWcuZGF0YS5CaW5JbnRlclNlZGVzLndyaXRlRGF0dW0oQmluSW50ZXJTZWRlcy5qYXZhOjQ0MykN Cj4gYXQgb3JnLmFwYWNoZS5waWcuZGF0YS5CaW5TZWRlc1R1cGxlLndyaXRlKEJpblNlZGVzVHVw bGUuamF2YTo0MSkNCj4gYXQgb3JnLmFwYWNoZS5waWcuaW1wbC5pby5QaWdOdWxsYWJsZVdyaXRh YmxlLndyaXRlKFBpZ051bGxhYmxlV3JpdGFibGUuamF2YToxMjMpDQo+IGF0IG9yZy5hcGFjaGUu aGFkb29wLmlvLnNlcmlhbGl6ZXIuV3JpdGFibGVTZXJpYWxpemF0aW9uJFdyaXRhYmxlU2VyaWFs aXplci5zZXJpYWxpemUoV3JpdGFibGVTZXJpYWxpemF0aW9uLmphdmE6MTAwKQ0KPiBhdCBvcmcu YXBhY2hlLmhhZG9vcC5pby5zZXJpYWxpemVyLldyaXRhYmxlU2VyaWFsaXphdGlvbiRXcml0YWJs ZVNlcmlhbGl6ZXIuc2VyaWFsaXplKFdyaXRhYmxlU2VyaWFsaXphdGlvbi5qYXZhOjg0KQ0KPiBh dCBvcmcuYXBhY2hlLmhhZG9vcC5tYXByZWQuSUZpbGUkV3JpdGVyLmFwcGVuZChJRmlsZS5qYXZh OjE4OCkNCj4gYXQgb3JnLmFwYWNoZS5oYWRvb3AubWFwcmVkLlRhc2skQ29tYmluZU91dHB1dENv bGxlY3Rvci5jb2xsZWN0KFRhc2suamF2YToxMTQ1KQ0KPiBhdCBvcmcuYXBhY2hlLmhhZG9vcC5t YXByZWQuVGFzayROZXdDb21iaW5lclJ1bm5lciRPdXRwdXRDb252ZXJ0ZXIud3JpdGUoVGFzay5q YXZhOjE0NTYpDQo+IGF0IG9yZy5hcGFjaGUuaGFkb29wLm1hcHJlZHVjZS50YXNrLlRhc2tJbnB1 dE91dHB1dENvbnRleHRJbXBsLndyaXRlKFRhc2tJbnB1dE91dHB1dENvbnRleHRJbXBsLmphdmE6 ODUpDQo+IGF0IG9yZy5hcGFjaGUuaGFkb29wLm1hcHJlZHVjZS5saWIucmVkdWNlLldyYXBwZWRS ZWR1Y2VyJENvbnRleHQud3JpdGUoV3JhcHBlZFJlZHVjZXIuamF2YTo5OSkNCj4gYXQgb3JnLmFw YWNoZS5waWcuYmFja2VuZC5oYWRvb3AuZXhlY3V0aW9uZW5naW5lLm1hcFJlZHVjZUxheWVyLlBp Z0NvbWJpbmVyJENvbWJpbmUucHJvY2Vzc09uZVBhY2thZ2VPdXRwdXQoUGlnQ29tYmluZXIuamF2 YToyMDEpDQo+IGF0IG9yZy5hcGFjaGUucGlnLmJhY2tlbmQuaGFkb29wLmV4ZWN1dGlvbmVuZ2lu ZS5tYXBSZWR1Y2VMYXllci5QaWdDb21iaW5lciRDb21iaW5lLnJlZHVjZShQaWdDb21iaW5lci5q YXZhOjE2MykNCj4gYXQgb3JnLmFwYWNoZS5waWcuYmFja2VuZC5oYWRvb3AuZXhlY3V0aW9uZW5n aW5lLm1hcFJlZHVjZUxheWVyLlBpZ0NvbWJpbmVyJENvbWJpbmUucmVkdWNlKFBpZ0NvbWJpbmVy LmphdmE6NTEpDQo+IA0KPiBfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fDQo+IGxlaXdhbmdvdWNAZ21haWwuY29tDQog DQogDQo= ------=_001_NextPart631813124738_=---- Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable =0A

Found= the root reason.
It is because of the nested distinct operation= relies on the RAM to calculate unique values.
As described here: http://stackoverflow.com= /questions/10732456/how-to-optimize-a-group-by-statement-in-pig-latin<= span arial,=3D"" '";=3D"" font-size:=3D"" 14px;=3D"" color:=3D"" rgb(0,=3D= "" 0,=3D"" 0);=3D"" background-color:=3D"" rgb(255,=3D"" 255,=3D"" 255);= =3D"" font-weight:=3D"" normal;=3D"" font-style:=3D"" normal;text-decorati= on:=3D"" none;'=3D"" style=3D"font-family: ''; font-size: 10.5pt; line-hei= ght: 1.5; background-color: window;"> 
=0A

Thanks,Lei

=0A
leiwangouc@gmail.com
<= /div>=0A
 
=
Date: 2014-04-16 11:58
To: user; th; german.fl; user
Subject:&nb= sp;Re: Re: java.lang.OutOfMemoryError related with number of reducer?
=0A
= Hi German & Thomas,

    Seem= s i found the data that causes the error, but i still don't know the exact= ly reason.

    I just do a group with p= ig latin: 
    
domain_device_group =3D GROUP data_filte= r BY (custid, domain, level, device); =0A
domain_device =3D FOREACH dom= ain_device_group {=0A
distinct_ip =3D DISTINCT data_filter.ip;=0A        distinct_userid =3D DISTINCT data_filter.use= rid;=0A
        GENERATE group.custid, group.domain= , group.level, group.device, COUNT_STAR(data_filter), COUNT_STAR(distinct_= ip), COUNT_STAR(distinct_userid);=0A
}=0A
STORE domain_device INTO '= $outputdir/$batchdate/data/domain_device' USING PigStorage('\t');
=0A

The group key (custid, domain, level, devi= ce)  is significantly skewed,  about 42% () of the records are the same key, a= nd only the reducer which handle this key failed.
But fro= m https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/ch= apter-6/shuffle-and-sort ,  I still have no idea why it caus= e an OOM.  It doesn't tell how skewed key will be handled, neither ho= w different keys in same reducer will be merged. 
=0A
leiwangouc@gmail.com
=0A
 
Dat= e: 2014-04-15 23:35
S= ubject: Re: RE: memoryjava.lang.OutOfMemoryError related with num= ber of reducer?
=0A=0A
Thanks,= let me take a careful look at it. 
=0A


=0A
leiwangouc@gmail.com
=0A
&n= bsp;
Date: 2014= -04-15 23:27
To: user; <= a href=3D"mailto:th@bentzn.com" style=3D"color: blue; text-decoration: und= erline;">'th'
Subject: RE: Re: memoryjava.lang.O= utOfMemoryError related with number of reducer?