crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: Join.join(PTable<?,Void>, ?) return empty collection
Date Wed, 30 Jul 2014 16:45:35 GMT
My hypothesis is that we re-use null in joins to indicate the absence of a
value, so if the value of an entry is null, we assume it's non-existent.
I'm assuming there isn't an easy way to switch the Void out for a non-null
but ignored value?

J


On Wed, Jul 30, 2014 at 9:35 AM, Mārtiņš Kalvāns <martins.kalvans@gmail.com>
wrote:

> Hi.
>
> I stumbled on weird behaviour (bug?) when joining PTable<?, Void> on left
> side with any other PTable - resulting collection is empty.
> Attached example code demonstrates unexpected behaviour.
> Code in question is in org.apache.crunch.lib.join.InnerJoinFn line 59
> where it checks for null reference on left dataset (same for other join fn
> implementations).
> Anyone can comment on this?
>
>
> --
> Mārtiņš Kalvāns
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message