crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Lewi <jer...@lewi.us>
Subject Re: Exception with AvroPathPerKeyTarget
Date Fri, 28 Mar 2014 17:13:48 GMT
Unfortunately that didn't work. I still have a reduce only job.

Here's a link to the console output in case that's helpful:
https://drive.google.com/a/lewi.us/file/d/0B6ngy4MCihWwcy1sdE9DQ2hiYnc/edit?usp=sharing


I'm currently ungrouping my records before writing them (an earlier attempt
to fix this issue). I'm trying without the ungroup now.

J


On Fri, Mar 28, 2014 at 10:08 AM, Jeremy Lewi <jeremy@lewi.us> wrote:

> Unfortunately that didn't work. I still have a reduce only job. I'm
> attaching the console output from when I run my job in case thats helpful.
> I'm currently ungrouping my records before writing them (an earlier
> attempt to fix this). I'm try undoing that.
>
> J
>
>
> On Fri, Mar 28, 2014 at 9:51 AM, Jeremy Lewi <jeremy@lewi.us> wrote:
>
>> Thanks Gabriel I'll give that a try now. I was actually planning on
>> making that change once I realized that my current strategy was forcing me
>> to materialize data early on.
>>
>>
>> On Fri, Mar 28, 2014 at 7:44 AM, Gabriel Reid <gabriel.reid@gmail.com>wrote:
>>
>>> On Fri, Mar 28, 2014 at 3:19 PM, Jeremy Lewi <jeremy@lewi.us> wrote:
>>> > No luck. I get the same error even when using a single reducer. I'm
>>> > attaching the job configuration as shown in the web ui.
>>> >
>>> > When I look at the job tracker for the job, it has no map tasks. Is
>>> that
>>> > expected? I've never heard of a reduce only job.
>>> >
>>>
>>> Nope, a job with no map tasks doesn't sound right to me. I noticed
>>> that you're doing a effectively doing a materialize at [1], and then
>>> using a BloomFilterJoinStrategy. While this should work fine, I'm
>>> thinking that it could also potentially lead to some issues such as
>>> the one you're having (i.e. a job with no map tasks).
>>>
>>> Could you try using the default join strategy there to see what
>>> happens. I'm thinking that the AvroPathPerKeyTarget issue could just a
>>> consequence of something else going wrong earlier on.
>>>
>>> 1.
>>> https://code.google.com/p/contrail-bio/source/browse/src/main/java/contrail/scaffolding/FilterReads.java?name=dev_read_filtering#156
>>>
>>> >
>>> > On Fri, Mar 28, 2014 at 6:45 AM, Jeremy Lewi <jeremy@lewi.us> wrote:
>>> >>
>>> >> This is my first time on a  cluster I'll try what Josh suggests now.
>>> >>
>>> >> J
>>> >>
>>> >>
>>> >> On Fri, Mar 28, 2014 at 3:41 AM, Josh Wills <josh.wills@gmail.com>
>>> wrote:
>>> >>>
>>> >>>
>>> >>> On Fri, Mar 28, 2014 at 1:22 AM, Gabriel Reid <
>>> gabriel.reid@gmail.com>
>>> >>> wrote:
>>> >>>>
>>> >>>> Hi Jeremy,
>>> >>>>
>>> >>>> On Thu, Mar 27, 2014 at 3:26 PM, Jeremy Lewi <jeremy@lewi.us>
>>> wrote:
>>> >>>> > Hi
>>> >>>> >
>>> >>>> > I'm hitting the exception pasted below when using
>>> >>>> > AvroPathPerKeyTarget.
>>> >>>> > Interestingly, my code works just fine when I run on a
small
>>> dataset
>>> >>>> > using
>>> >>>> > the LocalJobTracker. However, when I run on a large dataset
using
>>> a
>>> >>>> > hadoop
>>> >>>> > cluster I hit the exception.
>>> >>>> >
>>> >>>>
>>> >>>> Have you ever been able to successfully use the AvroPathPerKeyTarget
>>> >>>> on a real cluster, or is this the first try with it?
>>> >>>>
>>> >>>> I'm wondering if this could be a problem that's always been
around
>>> (as
>>> >>>> the integration test for AvroPathPerKeyTarget also runs in the
local
>>> >>>> jobtracker), or if this could be something new.
>>> >>>
>>> >>>
>>> >>> +1-- Jeremy, if you force the job to run w/a single reducer on the
>>> >>> cluster (i.e., via groupByKey(1)), does it work?
>>> >>>
>>> >>>>
>>> >>>>
>>> >>>> - Gabriel
>>> >>>
>>> >>>
>>> >>
>>> >
>>>
>>
>>
>

Mime
View raw message