crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabriel Reid <gabriel.r...@gmail.com>
Subject Re: Removing PCollection.cache call resulting in two MR jobs writing to same path
Date Sun, 31 Jul 2016 07:36:31 GMT
Hi Ben,

That doesn't sound like expected behavior to me, although there might
be some extra details that cause it to be executed in that way. Any
chance you could put together a small example test case that
demonstrates this?

- Gabriel

On Fri, Jul 29, 2016 at 8:33 PM, Ben Juhn <benjijuhn@gmail.com> wrote:
> I removed a .cache call and am seeing some troublesome behavior.  It results in two nodes
in Crunch's execution graph writing to the same output path.  When I add the .cache call back
I end up with one node writing to crunch tmp space, and the other node writing to the output
path.
>
> Is this expected behavior?
>
> Thanks,
> Ben

Mime
View raw message