avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ey-chih chow (JIRA)" <j...@apache.org>
Subject [jira] Updated: (AVRO-782) issue of cache coherence or reuse for avro map reduce
Date Thu, 17 Mar 2011 00:36:29 GMT

     [ https://issues.apache.org/jira/browse/AVRO-782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

ey-chih chow updated AVRO-782:
------------------------------

    Status: Patch Available  (was: Open)

> issue of cache coherence or reuse for avro map reduce
> -----------------------------------------------------
>
>                 Key: AVRO-782
>                 URL: https://issues.apache.org/jira/browse/AVRO-782
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.5.0
>         Environment: Mac with VMWare running Linux training-vm 2.6.28-19-server #61-Ubuntu
>            Reporter: ey-chih chow
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Our map reduce jobs are using Avro map/reduce API.  For one of the jobs, we got the following
trace for the reducer:
> ====================================================================================================
> attempt_20110310145147365_0002_r_000000_0/syslog:2011-03-10 14:52:31,226 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer:
working on 0000000000000000000000000000000000000 whose rowKey is 0000000000000000000000000000000000000
> attempt_20110310145315542_0002_r_000000_0/syslog:2011-03-10 14:53:59,010 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer:
working on 0000000000000000000000000000000000000 whose rowKey is 0000000000000000000000000000000000000
> attempt_20110310145315542_0002_r_000000_0/syslog:2011-03-10 14:53:59,016 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer:
working on 0000000100000000000000000000000000001 whose rowKey is 0000000200000000000000000000000000002
> attempt_20110310145315542_0002_r_000000_0/syslog:2011-03-10 14:53:59,017 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer:
working on 0000000200000000000000000000000000002 whose rowKey is 0000000300000000000000000000000000003
> attempt_20110310145315542_0002_r_000000_0/syslog:2011-03-10 14:53:59,021 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer:
working on 0000000300000000000000000000000000003 whose rowKey is 0000000400000000000000000000000000004
> attempt_20110310145315542_0002_r_000000_0/syslog:2011-03-10 14:53:59,023 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer:
working on 0000000400000000000000000000000000004 whose rowKey is 0000000500000000000000000000000000005
> attempt_20110310145315542_0002_r_000000_0/syslog:2011-03-10 14:53:59,024 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer:
working on 0000000500000000000000000000000000005 whose rowKey is 0000000500000000000000000000000000005
> ====================================================================================================
> If we add the following two lines to the reducer code:
> ====================================================================================================
> boolean workAround = getConf().getBoolean(NgActivityGatheringJob.NG_AVRO_BUG_WORKAROUND,
true);
> Utf8 dupKey = (workAround) ? new Utf8(key.toString()) : key; // use dupKey instead of
key passed to reducer
> ====================================================================================================
> We got the following trace, which we consider as the right behavior:
> ====================================================================================================
> 2011-03-10 15:04:33,431 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer:
working on 0000000000000000000000000000000000000 whose rowKey is 0000000000000000000000000000000000000
> attempt_20110310150517897_0002_r_000000_0/syslog:2011-03-10 15:06:01,374 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer:
working on 0000000000000000000000000000000000000 whose rowKey is 0000000000000000000000000000000000000
> attempt_20110310150517897_0002_r_000000_0/syslog:2011-03-10 15:06:01,381 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer:
working on 0000000100000000000000000000000000001 whose rowKey is 0000000100000000000000000000000000001
> attempt_20110310150517897_0002_r_000000_0/syslog:2011-03-10 15:06:01,383 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer:
working on 0000000200000000000000000000000000002 whose rowKey is 0000000200000000000000000000000000002
> attempt_20110310150517897_0002_r_000000_0/syslog:2011-03-10 15:06:01,389 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer:
working on 0000000300000000000000000000000000003 whose rowKey is 0000000300000000000000000000000000003
> attempt_20110310150517897_0002_r_000000_0/syslog:2011-03-10 15:06:01,391 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer:
working on 0000000400000000000000000000000000004 whose rowKey is 0000000400000000000000000000000000004
> attempt_20110310150517897_0002_r_000000_0/syslog:2011-03-10 15:06:01,393 INFO com.ngmoco.ngpipes.sourcing.NgActivityGatheringReducer:
working on 0000000500000000000000000000000000005 whose rowKey is 0000000500000000000000000000000000005
> ====================================================================================================
> According to Scott Carey, this might relate to object reuse.  We have created an Unit
test case that will reproduce the problem.  The test case will be attached as a patch.  Note
that we run this test case under our Ngmoco dev environment, which might need to make some
adjustment to run on other environment.   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message