crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabriel Reid <gabriel.r...@gmail.com>
Subject Re: Exception with AvroPathPerKeyTarget
Date Fri, 28 Mar 2014 18:11:03 GMT
On Fri, Mar 28, 2014 at 6:13 PM, Jeremy Lewi <jeremy@lewi.us> wrote:
> Unfortunately that didn't work. I still have a reduce only job.
>
> Here's a link to the console output in case that's helpful:
> https://drive.google.com/a/lewi.us/file/d/0B6ngy4MCihWwcy1sdE9DQ2hiYnc/edit?usp=sharing
>
>
> I'm currently ungrouping my records before writing them (an earlier attempt
> to fix this issue). I'm trying without the ungroup now.

Looking at the console output, I noticed that the second and third
jobs are logging "Total input paths to process : 0", which makes me
think that the first job being run doesn't have any output. Could you
check the job counters there to see if it is indeed outputting
anything? And was your local job running on the same data?

The fact that there are no inputs would explain the reduce-only job,
and I'm guessing/hoping that will be the reason the
AvroPathPerKeyTarget is breaking.

- Gabriel


>
> J
>
>
> On Fri, Mar 28, 2014 at 10:08 AM, Jeremy Lewi <jeremy@lewi.us> wrote:
>>
>> Unfortunately that didn't work. I still have a reduce only job. I'm
>> attaching the console output from when I run my job in case thats helpful.
>> I'm currently ungrouping my records before writing them (an earlier
>> attempt to fix this). I'm try undoing that.
>>
>> J
>>
>>
>> On Fri, Mar 28, 2014 at 9:51 AM, Jeremy Lewi <jeremy@lewi.us> wrote:
>>>
>>> Thanks Gabriel I'll give that a try now. I was actually planning on
>>> making that change once I realized that my current strategy was forcing me
>>> to materialize data early on.
>>>
>>>
>>> On Fri, Mar 28, 2014 at 7:44 AM, Gabriel Reid <gabriel.reid@gmail.com>
>>> wrote:
>>>>
>>>> On Fri, Mar 28, 2014 at 3:19 PM, Jeremy Lewi <jeremy@lewi.us> wrote:
>>>> > No luck. I get the same error even when using a single reducer. I'm
>>>> > attaching the job configuration as shown in the web ui.
>>>> >
>>>> > When I look at the job tracker for the job, it has no map tasks. Is
>>>> > that
>>>> > expected? I've never heard of a reduce only job.
>>>> >
>>>>
>>>> Nope, a job with no map tasks doesn't sound right to me. I noticed
>>>> that you're doing a effectively doing a materialize at [1], and then
>>>> using a BloomFilterJoinStrategy. While this should work fine, I'm
>>>> thinking that it could also potentially lead to some issues such as
>>>> the one you're having (i.e. a job with no map tasks).
>>>>
>>>> Could you try using the default join strategy there to see what
>>>> happens. I'm thinking that the AvroPathPerKeyTarget issue could just a
>>>> consequence of something else going wrong earlier on.
>>>>
>>>> 1.
>>>> https://code.google.com/p/contrail-bio/source/browse/src/main/java/contrail/scaffolding/FilterReads.java?name=dev_read_filtering#156
>>>>
>>>> >
>>>> > On Fri, Mar 28, 2014 at 6:45 AM, Jeremy Lewi <jeremy@lewi.us>
wrote:
>>>> >>
>>>> >> This is my first time on a  cluster I'll try what Josh suggests
now.
>>>> >>
>>>> >> J
>>>> >>
>>>> >>
>>>> >> On Fri, Mar 28, 2014 at 3:41 AM, Josh Wills <josh.wills@gmail.com>
>>>> >> wrote:
>>>> >>>
>>>> >>>
>>>> >>> On Fri, Mar 28, 2014 at 1:22 AM, Gabriel Reid
>>>> >>> <gabriel.reid@gmail.com>
>>>> >>> wrote:
>>>> >>>>
>>>> >>>> Hi Jeremy,
>>>> >>>>
>>>> >>>> On Thu, Mar 27, 2014 at 3:26 PM, Jeremy Lewi <jeremy@lewi.us>
>>>> >>>> wrote:
>>>> >>>> > Hi
>>>> >>>> >
>>>> >>>> > I'm hitting the exception pasted below when using
>>>> >>>> > AvroPathPerKeyTarget.
>>>> >>>> > Interestingly, my code works just fine when I run on
a small
>>>> >>>> > dataset
>>>> >>>> > using
>>>> >>>> > the LocalJobTracker. However, when I run on a large
dataset using
>>>> >>>> > a
>>>> >>>> > hadoop
>>>> >>>> > cluster I hit the exception.
>>>> >>>> >
>>>> >>>>
>>>> >>>> Have you ever been able to successfully use the
>>>> >>>> AvroPathPerKeyTarget
>>>> >>>> on a real cluster, or is this the first try with it?
>>>> >>>>
>>>> >>>> I'm wondering if this could be a problem that's always been
around
>>>> >>>> (as
>>>> >>>> the integration test for AvroPathPerKeyTarget also runs
in the
>>>> >>>> local
>>>> >>>> jobtracker), or if this could be something new.
>>>> >>>
>>>> >>>
>>>> >>> +1-- Jeremy, if you force the job to run w/a single reducer
on the
>>>> >>> cluster (i.e., via groupByKey(1)), does it work?
>>>> >>>
>>>> >>>>
>>>> >>>>
>>>> >>>> - Gabriel
>>>> >>>
>>>> >>>
>>>> >>
>>>> >
>>>
>>>
>>
>

Mime
View raw message