crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Lewi <jer...@lewi.us>
Subject Re: Exception with AvroPathPerKeyTarget
Date Fri, 28 Mar 2014 16:51:21 GMT
Thanks Gabriel I'll give that a try now. I was actually planning on making
that change once I realized that my current strategy was forcing me to
materialize data early on.


On Fri, Mar 28, 2014 at 7:44 AM, Gabriel Reid <gabriel.reid@gmail.com>wrote:

> On Fri, Mar 28, 2014 at 3:19 PM, Jeremy Lewi <jeremy@lewi.us> wrote:
> > No luck. I get the same error even when using a single reducer. I'm
> > attaching the job configuration as shown in the web ui.
> >
> > When I look at the job tracker for the job, it has no map tasks. Is that
> > expected? I've never heard of a reduce only job.
> >
>
> Nope, a job with no map tasks doesn't sound right to me. I noticed
> that you're doing a effectively doing a materialize at [1], and then
> using a BloomFilterJoinStrategy. While this should work fine, I'm
> thinking that it could also potentially lead to some issues such as
> the one you're having (i.e. a job with no map tasks).
>
> Could you try using the default join strategy there to see what
> happens. I'm thinking that the AvroPathPerKeyTarget issue could just a
> consequence of something else going wrong earlier on.
>
> 1.
> https://code.google.com/p/contrail-bio/source/browse/src/main/java/contrail/scaffolding/FilterReads.java?name=dev_read_filtering#156
>
> >
> > On Fri, Mar 28, 2014 at 6:45 AM, Jeremy Lewi <jeremy@lewi.us> wrote:
> >>
> >> This is my first time on a  cluster I'll try what Josh suggests now.
> >>
> >> J
> >>
> >>
> >> On Fri, Mar 28, 2014 at 3:41 AM, Josh Wills <josh.wills@gmail.com>
> wrote:
> >>>
> >>>
> >>> On Fri, Mar 28, 2014 at 1:22 AM, Gabriel Reid <gabriel.reid@gmail.com>
> >>> wrote:
> >>>>
> >>>> Hi Jeremy,
> >>>>
> >>>> On Thu, Mar 27, 2014 at 3:26 PM, Jeremy Lewi <jeremy@lewi.us>
wrote:
> >>>> > Hi
> >>>> >
> >>>> > I'm hitting the exception pasted below when using
> >>>> > AvroPathPerKeyTarget.
> >>>> > Interestingly, my code works just fine when I run on a small dataset
> >>>> > using
> >>>> > the LocalJobTracker. However, when I run on a large dataset using
a
> >>>> > hadoop
> >>>> > cluster I hit the exception.
> >>>> >
> >>>>
> >>>> Have you ever been able to successfully use the AvroPathPerKeyTarget
> >>>> on a real cluster, or is this the first try with it?
> >>>>
> >>>> I'm wondering if this could be a problem that's always been around (as
> >>>> the integration test for AvroPathPerKeyTarget also runs in the local
> >>>> jobtracker), or if this could be something new.
> >>>
> >>>
> >>> +1-- Jeremy, if you force the job to run w/a single reducer on the
> >>> cluster (i.e., via groupByKey(1)), does it work?
> >>>
> >>>>
> >>>>
> >>>> - Gabriel
> >>>
> >>>
> >>
> >
>

Mime
View raw message