Return-Path: X-Original-To: apmail-avro-dev-archive@www.apache.org Delivered-To: apmail-avro-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 01B678CA6 for ; Tue, 16 Aug 2011 19:48:51 +0000 (UTC) Received: (qmail 2080 invoked by uid 500); 16 Aug 2011 19:48:50 -0000 Delivered-To: apmail-avro-dev-archive@avro.apache.org Received: (qmail 1882 invoked by uid 500); 16 Aug 2011 19:48:49 -0000 Mailing-List: contact dev-help@avro.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@avro.apache.org Delivered-To: mailing list dev@avro.apache.org Received: (qmail 1866 invoked by uid 99); 16 Aug 2011 19:48:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Aug 2011 19:48:48 +0000 X-ASF-Spam-Status: No, hits=-2001.1 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Aug 2011 19:48:47 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 2FDAFBF87A for ; Tue, 16 Aug 2011 19:48:27 +0000 (UTC) Date: Tue, 16 Aug 2011 19:48:27 +0000 (UTC) From: "Scott Carey (JIRA)" To: dev@avro.apache.org Message-ID: <1152428252.42349.1313524107192.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <753370028.7888.1300321709803.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (AVRO-782) issue of cache coherence or reuse for avro map reduce MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/AVRO-782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated AVRO-782: ----------------------------- Fix Version/s: 1.6.0 > issue of cache coherence or reuse for avro map reduce > ----------------------------------------------------- > > Key: AVRO-782 > URL: https://issues.apache.org/jira/browse/AVRO-782 > Project: Avro > Issue Type: Bug > Components: java > Affects Versions: 1.5.0 > Environment: Mac with VMWare running Linux training-vm 2.6.28-19-server #61-Ubuntu > Reporter: ey-chih chow > Fix For: 1.6.0 > > Attachments: AVRO-782.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > Our map reduce jobs are using Avro map/reduce API. For one of the jobs, we got the following trace for the reducer: > ==================================================================================================== > attempt_20110310145147365_0002_r_000000_0/syslog:2011-03-10 14:52:31,226 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer: working on 0000000000000000000000000000000000000 whose rowKey is 0000000000000000000000000000000000000 > attempt_20110310145315542_0002_r_000000_0/syslog:2011-03-10 14:53:59,010 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer: working on 0000000000000000000000000000000000000 whose rowKey is 0000000000000000000000000000000000000 > attempt_20110310145315542_0002_r_000000_0/syslog:2011-03-10 14:53:59,016 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer: working on 0000000100000000000000000000000000001 whose rowKey is 0000000200000000000000000000000000002 > attempt_20110310145315542_0002_r_000000_0/syslog:2011-03-10 14:53:59,017 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer: working on 0000000200000000000000000000000000002 whose rowKey is 0000000300000000000000000000000000003 > attempt_20110310145315542_0002_r_000000_0/syslog:2011-03-10 14:53:59,021 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer: working on 0000000300000000000000000000000000003 whose rowKey is 0000000400000000000000000000000000004 > attempt_20110310145315542_0002_r_000000_0/syslog:2011-03-10 14:53:59,023 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer: working on 0000000400000000000000000000000000004 whose rowKey is 0000000500000000000000000000000000005 > attempt_20110310145315542_0002_r_000000_0/syslog:2011-03-10 14:53:59,024 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer: working on 0000000500000000000000000000000000005 whose rowKey is 0000000500000000000000000000000000005 > ==================================================================================================== > If we add the following two lines to the reducer code: > ==================================================================================================== > boolean workAround = getConf().getBoolean(NgActivityGatheringJob.NG_AVRO_BUG_WORKAROUND, true); > Utf8 dupKey = (workAround) ? new Utf8(key.toString()) : key; // use dupKey instead of key passed to reducer > ==================================================================================================== > We got the following trace, which we consider as the right behavior: > ==================================================================================================== > 2011-03-10 15:04:33,431 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer: working on 0000000000000000000000000000000000000 whose rowKey is 0000000000000000000000000000000000000 > attempt_20110310150517897_0002_r_000000_0/syslog:2011-03-10 15:06:01,374 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer: working on 0000000000000000000000000000000000000 whose rowKey is 0000000000000000000000000000000000000 > attempt_20110310150517897_0002_r_000000_0/syslog:2011-03-10 15:06:01,381 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer: working on 0000000100000000000000000000000000001 whose rowKey is 0000000100000000000000000000000000001 > attempt_20110310150517897_0002_r_000000_0/syslog:2011-03-10 15:06:01,383 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer: working on 0000000200000000000000000000000000002 whose rowKey is 0000000200000000000000000000000000002 > attempt_20110310150517897_0002_r_000000_0/syslog:2011-03-10 15:06:01,389 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer: working on 0000000300000000000000000000000000003 whose rowKey is 0000000300000000000000000000000000003 > attempt_20110310150517897_0002_r_000000_0/syslog:2011-03-10 15:06:01,391 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer: working on 0000000400000000000000000000000000004 whose rowKey is 0000000400000000000000000000000000004 > attempt_20110310150517897_0002_r_000000_0/syslog:2011-03-10 15:06:01,393 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer: working on 0000000500000000000000000000000000005 whose rowKey is 0000000500000000000000000000000000005 > ==================================================================================================== > According to Scott Carey, this might relate to object reuse. We have created an Unit test case that will reproduce the problem. The test case will be attached as a patch. Note that we run this test case under our Ngmoco dev environment, which might need to make some adjustment to run on other environment. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira