crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Lewi <jer...@lewi.us>
Subject Re: Exception with AvroPathPerKeyTarget
Date Sat, 29 Mar 2014 01:27:17 GMT
Thanks for the tip. I'll look into it and try to figure it out.


On Fri, Mar 28, 2014 at 11:11 AM, Gabriel Reid <gabriel.reid@gmail.com>wrote:

> On Fri, Mar 28, 2014 at 6:13 PM, Jeremy Lewi <jeremy@lewi.us> wrote:
> > Unfortunately that didn't work. I still have a reduce only job.
> >
> > Here's a link to the console output in case that's helpful:
> >
> https://drive.google.com/a/lewi.us/file/d/0B6ngy4MCihWwcy1sdE9DQ2hiYnc/edit?usp=sharing
> >
> >
> > I'm currently ungrouping my records before writing them (an earlier
> attempt
> > to fix this issue). I'm trying without the ungroup now.
>
> Looking at the console output, I noticed that the second and third
> jobs are logging "Total input paths to process : 0", which makes me
> think that the first job being run doesn't have any output. Could you
> check the job counters there to see if it is indeed outputting
> anything? And was your local job running on the same data?
>
> The fact that there are no inputs would explain the reduce-only job,
> and I'm guessing/hoping that will be the reason the
> AvroPathPerKeyTarget is breaking.
>
> - Gabriel
>
>
> >
> > J
> >
> >
> > On Fri, Mar 28, 2014 at 10:08 AM, Jeremy Lewi <jeremy@lewi.us> wrote:
> >>
> >> Unfortunately that didn't work. I still have a reduce only job. I'm
> >> attaching the console output from when I run my job in case thats
> helpful.
> >> I'm currently ungrouping my records before writing them (an earlier
> >> attempt to fix this). I'm try undoing that.
> >>
> >> J
> >>
> >>
> >> On Fri, Mar 28, 2014 at 9:51 AM, Jeremy Lewi <jeremy@lewi.us> wrote:
> >>>
> >>> Thanks Gabriel I'll give that a try now. I was actually planning on
> >>> making that change once I realized that my current strategy was
> forcing me
> >>> to materialize data early on.
> >>>
> >>>
> >>> On Fri, Mar 28, 2014 at 7:44 AM, Gabriel Reid <gabriel.reid@gmail.com>
> >>> wrote:
> >>>>
> >>>> On Fri, Mar 28, 2014 at 3:19 PM, Jeremy Lewi <jeremy@lewi.us>
wrote:
> >>>> > No luck. I get the same error even when using a single reducer.
I'm
> >>>> > attaching the job configuration as shown in the web ui.
> >>>> >
> >>>> > When I look at the job tracker for the job, it has no map tasks.
Is
> >>>> > that
> >>>> > expected? I've never heard of a reduce only job.
> >>>> >
> >>>>
> >>>> Nope, a job with no map tasks doesn't sound right to me. I noticed
> >>>> that you're doing a effectively doing a materialize at [1], and then
> >>>> using a BloomFilterJoinStrategy. While this should work fine, I'm
> >>>> thinking that it could also potentially lead to some issues such as
> >>>> the one you're having (i.e. a job with no map tasks).
> >>>>
> >>>> Could you try using the default join strategy there to see what
> >>>> happens. I'm thinking that the AvroPathPerKeyTarget issue could just
a
> >>>> consequence of something else going wrong earlier on.
> >>>>
> >>>> 1.
> >>>>
> https://code.google.com/p/contrail-bio/source/browse/src/main/java/contrail/scaffolding/FilterReads.java?name=dev_read_filtering#156
> >>>>
> >>>> >
> >>>> > On Fri, Mar 28, 2014 at 6:45 AM, Jeremy Lewi <jeremy@lewi.us>
> wrote:
> >>>> >>
> >>>> >> This is my first time on a  cluster I'll try what Josh suggests
> now.
> >>>> >>
> >>>> >> J
> >>>> >>
> >>>> >>
> >>>> >> On Fri, Mar 28, 2014 at 3:41 AM, Josh Wills <josh.wills@gmail.com>
> >>>> >> wrote:
> >>>> >>>
> >>>> >>>
> >>>> >>> On Fri, Mar 28, 2014 at 1:22 AM, Gabriel Reid
> >>>> >>> <gabriel.reid@gmail.com>
> >>>> >>> wrote:
> >>>> >>>>
> >>>> >>>> Hi Jeremy,
> >>>> >>>>
> >>>> >>>> On Thu, Mar 27, 2014 at 3:26 PM, Jeremy Lewi <jeremy@lewi.us>
> >>>> >>>> wrote:
> >>>> >>>> > Hi
> >>>> >>>> >
> >>>> >>>> > I'm hitting the exception pasted below when using
> >>>> >>>> > AvroPathPerKeyTarget.
> >>>> >>>> > Interestingly, my code works just fine when I
run on a small
> >>>> >>>> > dataset
> >>>> >>>> > using
> >>>> >>>> > the LocalJobTracker. However, when I run on a
large dataset
> using
> >>>> >>>> > a
> >>>> >>>> > hadoop
> >>>> >>>> > cluster I hit the exception.
> >>>> >>>> >
> >>>> >>>>
> >>>> >>>> Have you ever been able to successfully use the
> >>>> >>>> AvroPathPerKeyTarget
> >>>> >>>> on a real cluster, or is this the first try with it?
> >>>> >>>>
> >>>> >>>> I'm wondering if this could be a problem that's always
been
> around
> >>>> >>>> (as
> >>>> >>>> the integration test for AvroPathPerKeyTarget also
runs in the
> >>>> >>>> local
> >>>> >>>> jobtracker), or if this could be something new.
> >>>> >>>
> >>>> >>>
> >>>> >>> +1-- Jeremy, if you force the job to run w/a single reducer
on the
> >>>> >>> cluster (i.e., via groupByKey(1)), does it work?
> >>>> >>>
> >>>> >>>>
> >>>> >>>>
> >>>> >>>> - Gabriel
> >>>> >>>
> >>>> >>>
> >>>> >>
> >>>> >
> >>>
> >>>
> >>
> >
>

Mime
View raw message